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DETAILED ACTION 
Priority 

Applicant's claim for domestic priority under 35 U.S.C. 11 9(e) is acknowledged. 

Specification 

The lengthy specification has not been checked to the extent necessary to 
determine the presence of all possible minor errors. Applicant's cooperation is 
requested in correcting any errors of which applicant may become aware In the 
specification. 

Information Disclosure Statement 

The information disclosure statements filed on 3/23/2007 and 3/3/2008 have 
been considered. Initialed copies are enclosed. 

Claim Rejections - 35 USC §112 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 
The specification shall contain awitten description of the invention, and of the 
manner and process of making and using it, in such full, clear, concise, and exact 
terms as to enable any person skilled in the art to which it pertains, or with which 
it is most nearly connected, to make and use the same and shall set forth the 

best mode contemplated by the inventor of canying out his invention. 

« 

Claims 1-11. 13, 15, 17, 19. 20-23, and 25-30 are rejected under 35 U.S.C. 112, 
first paragraph, as failing to comply with the written description requirement. The 
claim(s) contains subject matter which was not described in the specification in such a 
way as to reasonably convey to one skilled in the relevant art that the inventor(s), at the 
time the application was filed, had possession of the claimed invention. This is a written 
description rejection. 

The instant claims recite, a method in step (b), with said recitation, "target 
polynucleotide being a sequence operatively linked to a promoter native to S. cerevisiae 
gene YMR325W, or a YMR325W promoter sequence homolog comprising one or more 
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nucleotide substitutions, additions or deletions that do not effect the ability of the 
sequence to promote transcription of said operatively linked sequence". However, for a 
promoter to function there must be elements present which guide transcription. While 
there were many promoters and the elements therefrom known in the art. the promoters 
of the present invention must be associated with a gene which encodes an enzyme or 
regulator in the sterol biosynthesis pathway to be functional in a useful way in the 
invention. The term "homolog" is not defined in the specification, and does not have a 
precise meaning in the art (see the rejection, below, under 35 USC § 1 12, second 
paragraph), and is thus interpreted as reading upon any promoter possessing any 
degree of similarity to the specifically recited promoters, which thus represents a vast 
genus. Furthermore, there is no description in the specification of any structural features 
that would permit any given promoter to function in a relevant way in the present 
invention. As such, the genus of potential promoter sequences 

Furthermore, the instant claims are drawn to a vast genus of homologs thereof of 
SEQ ID NO: 3. To fulfill the written description requirements set forth under 35 USC § 
112, first paragraph, the specification must describe at least a substantial number of the 
members of the claimed genus, or alternatively describe a representative member of the 
claimed genus, which shares a particularly defining feature common to at least a 
substantial number of the members of the claimed genus, which would enable the 
skilled artisan to immediately recognize and distinguish its members from others, so as 
to reasonably convey to the skilled artisan that Applicant has possession the claimed 
invention. To adequately describe the genus of homologs thereof of SEQ ID NO: 3, 
applicant must also give a functional limitation of homologs thereof of SEQ ID NO: 3. 

The specification, however, does not disclose distinguishing and identifying 
features of a representative member of the genus of homologs tiiereof of SEQ ID NO: 3 
to which the claims are drawn, such as a correlation between structure of the peptide 
and its recited function, so that the skilled artisan could immediately envision or 
recognize at least a substantial number of members of the claimed genus of homologs 
thereofofSEQIDNO: 3. 

MPEP § 2163.02 states, "an objective standard for determining compliance with 
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the written description requirement is, 'does the description clearly allow persons of 
ordinary skill in the art to recognize that he or she invented what is claimed". The courts 
have decided: The purpose of the "written description" requirement is broader than to 
merely explain how to "make and use"; the applicant must convey with reasonable 
clarity to those skilled in the art that, as of the filing date sought, he or she was in 
possession of the invention. The invention is, for purposes of the "written description" 
inquiry, whatever is now claimed. See Vas-Cath, Inc.'v. Mahurkar, 935 F.2d 1555, 1563- 
64. 19 USPQ2d 1111, 1117 (Federal Circuit, 1991 ). Furthermore, the written 
description provision of 35 (JSC § 1 12 is severable from its enablement provision; and 
adequate written description requires more than a mere statement that it is part of the 
invention and reference to a potential method for isolating it. See Fiers v. Revel, 25 
USPQ2d 1601, 1606 (CAFC 1993)and Amgen Inc. V. Chugai Pharmaceutical Co. Ltd.. 
18USPQ2d 1016. 

The Guidelines for Examination of Patent Applications Under the 35 U.S.C. 1 12, 
paragraph 1, "'Written Description" Requirement (66 FR 1099-1111, January 5,2001) 
state, "[p]ossession may be shown in a variety of ways including description of an actual 
reduction to practice, or by showing the invention was 'ready for patenting' such as by 
disclosure of drawings or structural chemical formulas that show that the invention was 
complete, or by describing distinguishing identifying characteristics sufficient to show 
that the applicant was in possession of the claimed invention" (Id. at 1 104). 

The Guidelines further state, "[f]or inventions in an unpredictable art, adequate 
written description of a genus which embraces widely variant species cannot be 
achieved by disclosing only one species within the genus" (Id. at 1 106); accordingly, it 
follows that an adequate written description of a genus cannot be achieved in the 
absence of a disclosure of at least one species within the genus. Bowie et al (Science, 
1990, 247:1306-1310) teach that an amino acid sequence encodes a message that 
determines the shape and function of a protein and that it is the ability of these proteins 
to fold into unique three-dimensional structures that allows them to function, carry out 
the instructions of the genome and form immunoepitopes. Bowie et al. further teach that 
the problem of predicting protein structure from sequence data and in turn utilizing 



Application/Control Number: 10/566,426 Page 5 

Art Unit: 1645 

predicted structural determinations to ascertain functional aspects of the protein is 
extremely complex, (column 1, page 1306). Bowie et a! further teach that while it is 
known that many amino acid substitutions are possible In any given protein, the position 
within the protein's sequence where such amino acid substitutions can be made with a 
reasonable expectation of maintaining function are limited. Certain positions in the 
sequence are critical to the three dimensional structure/function relationship and these 
regions can tolerate only conservative substitutions or no substitutions (column 2, page 
1306). Therefore, in accordance with the Guidelines, the description of homologs 
thereof of SEQ ID NO: 3 is not deemed representative of the genus of SEQ ID NO: 3 of 
the claim invention thus the claim does not meet the written description requirement. 

Claim Rejections - 35 USC § 102 and 103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under 

■ 

section 122(b), by another filed in the United States before the invention by the 
applicant for patent or (2) a patent granted on an application for patent by 
another filed in the United States before the invention by the applicant for patent, 
except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application 
filed in the United States only if the international application designated the 
United States and was published under Article 21(2) of such treaty in the English 
language. 

(e) the invention was described in a patent granted on an application for patent 
by anothei^ filed in the United States before the invention thereof by the applicant 
for patent, or on an international application by another who has fulfilled the 
requirements of paragraphs (1), (2), and (4) of section 371(c) of this title before 
the invention thereof by the applicant for patent. 



Application/Control Number: 10/566,426 Page 6 

Art Unit: 1645 . 

The changes made to 35 U.S.C. 102(e) by the American Inventors Protection Act 
of 1999 (AlPA) and the Intellectual Property and High Technology Technical 
Amendments Act of 2002 do not apply when the reference is a U.S. patent resulting 
directly or indirectly from an international application filed before November 29, 2000. 
Therefore, the prior art date of the reference is determined under 35 U.S.C. 102(e) prior 
to the amendment by the AlPA (pre-AlPA 35 U.S.C. 1 02(e)). 

■ 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed 
or described as set forth in section 102 of this title, if the differences between the 
subject matter sought to be patented and the prior art are such that the subject 
matter as a whole would have been obvious at the time the invention was made 
to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was 
made. 

This application currently names joint inventors. In considering patentability of 
the claims under 35 U.S.C. 103(a), the examiner presumes that the subject matter of 
the various claims was commonly owned at the time any inventions covered therein 
were made absent any evidence to the contrary. Applicant is advised of the obligation 
under 37 CFR 1 .56 to point out the inventor and invention dates of each claim that was 
not commonly owned at the time a later invention was made in order for the examiner to 
consider the applicability of 35 U.S.C. 103(c) and potential 35 U.S.C. 102(e), (f) or (g) 
prior art under 35 U.S.C. 103(a). 

Claims 1-3, 5-9, 13, 15, 17, 19. 21, 23, 25, and 28-30 rejected under 35 U.S.C. 
102(e) as being unpatentable over Dixon et al US Patent No. 6828092 B1 Date 
December 7, 2004 US Filing Date January 12, 1998. 
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Examiner Interprets liomolog comprising one ore more nucleotide subsitutions, 
addition or deletion to be any promoter sequence. 

Claims 1-3, 5-9, 13, 15. 17, 19, 21, 23, 25, and 28-30 are drawn to a method for 

• * 

determining whether a molecule affects the function or activity of a sterol biosynthesis 
pathway in a S. cerevisiae cell comprising: (a) contacting said cell with, or 
recombinantly expressing within said cell, said molecule; (b) determining whether RNA 
expression or protein expression in said cell of a target polynucleotide sequence is 
changed in step (a) relative to the expression of said target polynucleotide sequence in 
the absence of said molecule, said target polynucleotide being a sequence operatively 
linked to a promoter native to S. cerevisiae gene YMR325W, or a YMR325W promoter 
homolog comprising one or more nucleotide substitutions, additions or deletions that do 
not effect the ability of the sequence to promote transcription of said operatively linked 
sequence; and (c) 

determining that said molecule affects the function or activity of said sterol biosynthesis 
pathway if expression of said target polynucleotide is changed, or determining that said 
molecule does not affect the function or activity of said sterol biosynthesis pathway if 
expression of said target polynucleotide sequence is unchanged (claim 1); a method for 
monitoring activity of a sterol biosynthesis pathway in a S. cerevisiae cell exposed to a 
molecule comprising: (a) contacting said cell with, or recombinantly expressing within 
said cell, said molecule; (b) determining whether RNA expression or protein expression 
in said cell of a target polynucleotide sequence is changed in step (a) relative to 
expression of said target polynucleotide sequence in the absence of said molecule, said 
target polynucleotide sequence being regulated by a promoter native to a S. 
cerevisiae YMR325W gene or a YMR325W homolog comprising one or more nucleotide 
substitutions, additions or deletion that do not effect the ability of the sequence to 
promote regulated transcription of said target polynucleotide sequence; and (c) 

* 

determining that the activity of the sterol biosynthesis pathway in said cell is 
changed if expression of said target polynucleotide is determined to be changed in step 
(b), or determining that the activity of the sterol biosynthesis pathway in said cell is 
unchanged if expression of said target polynucleotide is determined to be unchanged in 
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step (b) (claim 13); a method for identifying a molecule that modulates expression of a 
sterol biosynthesis pathway target polynucleotide sequence comprising: (a) 
recombinantly expressing in a S. cerevisiae cell, or contacting a S. cerevisiae 
cell with, at least one candidate molecule; and (b) measuring RNA or protein expression 

■ 

in said cell of a target polynucleotide sequence, said target polynucleotide sequence < 
being regulated by a promoter native to a S. cerevisiae YMR325W gene or a YMR325W 
sequence homolog comprising one or more nucleotide substitutions, additions or 
deletions that do not effect the ability of the sequence to promote regulated transcription 
of said target polynucleotide sequence thereof ; wherein an increase or decrease in 
expression of said target polynucleotide sequence relative to expression of said target 
polynucleotide sequence in the absence of said candidate molecule indicates that said 
candidate molecule modulates expression of said sterol biosynthesis pathway target 
polynucleotide sequence (claim 22). 

Dixon et al teach a method for the identification of agents which modulate sterol 
biosynthesis which method comprises contacting a test compound with a host cell of S. 
cerevisiae comprising a DNA sequence which controls expression of a yeast 
acetoacetyl CoA thiolase gene operably linked to a reporter system such that 
modulation of sterol biosynthesis in the host cell leads to a detectable change in cell 
phenotype, and determining whether any such detectable change has occurred (see 
abstract). Dixon et al teach one or more individual enzymes from the pathway are 
selected, and compounds are screened for their ability to inhibit these enzymes. Dixon 
et al teach that operably linked means linked in such a way as to provide the basic 
sequence signals necessary for initiation of gene transcription and initiation of gene 
translation. Dixon et al teach in vivo assays for inhibitors of sterol biosynthesis wherein 
inhibition leads to a change in the level of expression of a reporter gene and nucleic 
acids and recombinant cells use in the assays. Dixon et al teach that the activity of the 
reporter gene when grown under aerobic conditions in the absence of inhibitors of sterol 
biosynthesis is low. Dixon et al teach that the promoter region of S. cerevisiae 
acetoacetyl CoA thiolase is linked to a reporter gene the reporter gene may be induced 
by sterol biosynthesis inhibitors. Dixon et al teach an assay which is capable of 
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detecting a wide range of inhibitors of sterol biosynthesis and that the assay is simple, 
cheap and robust and. may be employed in high throughput mode to screen large 
chemical collections, natural product collections and compound libraries. Dixon et al 
teach that the assay described here may be used in combination with another reporter 
system in the same cell, allowing for compounds to be simultaneously screened for the 
ability to modulate sterol biosynthesis and other processes. Dixon et al teach novel 
forms of green fluorescent proteins with different absorption spectra that allow the use 
of multifunctional assays in the cell using the same output. Dixon et al teach that the 
advantage of using a reporter gene as a reporter system is that it confers a readily 
measurable phenotype upon the cell and that the reporter gene may conveniently 
comprise the coding sequence of a enzyme such as firefly luciferase, E. coli 
choramphenicol acetyl transferase, or green fluorescent protein, in which the phenotype 
conferred may be measured by alterations in fluorescence (see abstract claims. 

Thus Dixon et al teach a method for determining whether a molecule affects the 
function or activity and a method for monitoring activity of a sterol biosynthesis pathway 
in a S. cerevisiae cell comprising: (a) contacting said cell with, or recombinant^ 
expressing within said cell, said molecule; (b) determining whether RNA expression or 
protein expression in said cell of a target polynucleotide sequence is changed in step 
(a) relative to the expression of said target polynucleotide sequence in the absence of 
said molecule, said target polynucleotide being a sequence operatively linked to a 
promoter native to S. cerevisiae gene YMR325W promoter homolog comprising one or 
more nucleotide substitutions, additions or deletions that do not effect the ability of the 
sequence to promote transcription of said operatively linked sequence thereof ; and (c) 
determining that said molecule affects the function or activity and a method of 
determining that the activity of said sterol biosynthesis pathway if expression of said 
target polynucleotide Js changed, or determining that said molecule does not affect the 
function or activity of said sterol biosynthesis pathway if expression of said target 
polynucleotide sequence is unchanged, wherein said target polynucleotide sequence 
comprises a marker gene; wherein step (b) comprises determining whether the RNA 
expression or protein expression of said marker gene is changed in step (a) relative to 
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the expression of said marker gene in the absence of the molecule; and wherein step 
(c) comprises determining that said molecule affects the function or activity of said sterol 
biosynthesis pathway if expression of said marker gene is changed, or 
determining that said molecule does not affect the function or activity of said sterol 
biosynthesis pathway if expression of said marker gene is unchanged, wherein said 
molecule inhibits sterol biosynthesis such that said cell contacted with the molecule 
exhibits a lower level of sterol than a second cell which is not contacted with said 
molecule, wherein step (b) comprises determining whether RNA expression is changed, 
wherein step (b) comprises determining whether protein expression is changed, wherein 
step (c) comprises determining that said molecule inhibits sterol biosynthesis if 
expression of said target polynucleotide sequence in step (a) is increased relative to 
expression of said target polynucleotide sequence in the absence of said molecule, 
wherein the S. cerevisiae cell is a cell that recombinantly expresses said target 
polynucleotide sequence, wherein step (a) comprises contacting said cell with 
said molecule, and wherein step (a) is carried out In a liquid high throughput-like 
assay, wherein said molecule are proteins. 

Thus Dixon et a! teach a method for identifying a molecule that modulates 
expression of a sterol biosynthesis pathway target polynucleotide sequence comprising: 
(a) recombinantly expressing in a S. cerevisiae cell, or contacting a S. cerevisiae 
cell with, at least one candidate molecule; and (b) measuring RNA or protein expression 
in said cell of a target polynucleotide sequence, said target polynucleotide sequence 
being regulated by a YMR325W sequence homolog comprising one or more nucleotide 
substitutions, additions or deletions that do not effect the ability of the sequence to 
promote regulated transcription of said target polynucleotide sequence; wherein 
an increase or decrease in expression of said target polynucleotide sequence 
relative to expression of said target polynucleotide sequence in the absence of said 
candidate molecule indicates that said candidate molecule modulates expression of 
said sterol biosynthesis pathway target polynucleotide sequence, wherein step (a) 
comprises contacting said cell with a second, test cell, wherein said test cell produces 
said molecule, wherein said molecule is released by said test cell, wherein said 
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molecule is secreted by said test cell (see abstract, claims, columns 1-2 and 5-10 and 
example 1). 

Claims 1-11, 13. 15, 17, 19-23, and 25-30 rejected under 35 U.S.C. 103(a) as 
being unpatentable over Dixon et al US Patent No. 6828092 B1 Date December 7, 2004 
US Filing Date January 12, 1998 in view of Ashby et al WO/2000/58521 Date October 
5. 2000, Phillips, J. US Patent No: 7022481 B2 Date April 4, 2006 US Filing Date 
December 19, 2002, and Contreras et al. WO/2001 /02550A2 Date January 11, 2001. 

Claims 1-11, 13, 15, 17, 19-23, and 25-30 are drawn to A method for determining 
whether a molecule affects the function or activity of a sterol biosynthesis pathway in a 
S. cerevisiae cell comprising: (a) contacting said cell with, or recombinantly expressing 
within said cell, said molecule; (b) determining whether RNA expression or protein 
expression in said cell of a target polynucleotide sequence is changed in step (a) 
relative to the expression of said target polynucleotide sequence in the absence of said 
molecule, said target polynucleotide being a sequence operatively linked to a promoter 
native to S. cerevisiae gene YMR325W, or a YMR325W promoter homolog comprising 
one or more nucleotide substitutions, additions or deletions that do not effect the ability 
of the sequence to promote transcription of said operatively linked sequence; and (c) 
determining that said molecule affects the function or activity of said sterol biosynthesis 
pathway if expression of said target polynucleotide is changed, or determining that said 
molecule does not affect the function or activity of said sterol biosynthesis pathway if 
expression of said target polynucleotide sequence is unchanged (claim 1); a method for 
monitoring activity of a sterol biosynthesis pathway in a S. cerevisiae cell exposed to a 
molecule comprising: (a) contacting said cell with, or recombinantly expressing within 
said cell, said molecule; (b) determining whether RNA expression or protein expression 
in said cell of a target polynucleotide sequence is changed in step (a) relative to 
expression of said target polynucleotide sequence in the absence of said molecule, said 
target polynucleotide sequence being regulated by a promoter native to a S. 
cerevisiae YMR325W gene or a YMR325W homolog comprising one or more nucleotide 
substitutions, additions or deletion that do not effect the ability of the sequence to 
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promote regulated transcription of said target polynucleotide sequence; and (c) 
determining that the activity of the sterol biosynthesis pathway in said cell is 
changed if expression of said target polynucleotide is determined to be changed in step 
(b), or determining that the activity of the sterol biosynthesis pathway in said cell is 
unchanged if expression of said target polynucleotide is determined to be unchanged in 
step (b) (claim 13); a method for identifying a molecule that modulates expression of a 
sterol biosynthesis pathway target polynucleotide sequence comprising: (a) 
recombinantly expressing in a S. cerevisiae cell, or contacting a S. cerevisiae 
cell with, at least one candidate molecule; and (b) measuring RNA or protein expression 
in said cell of a target polynucleotide sequence, said target polynucleotide sequence 
being regulated by a promoter native to a S. cerevisiae YMR325W gene or a YMR325W 
sequence homolog comprising one or more nucleotide substitutions,. additions or 
deletions that do not effect the ability of the sequence to promote regulated transcription 
of said target polynucleotide sequence thereof ; wherein an increase or decrease in 
expression of said target polynucleotide sequence relative to expression of said target 
polynucleotide sequence in the absence of said candidate molecule indicates that said 
candidate molecule modulates expression of said sterol biosynthesis pathway target 
polynucleotide sequence (claim 22), 

Dixon is relied upon as set forth supra. However Dixon et al does not teach a 
method using YMR325W promoter, Dixon et al does not teach a method, wherein step 
(a) comprises contacting said cell with said molecule, and wherein step (a) is candied out 
in a solid plate halo assay, wherein step (a) comprises contacting said cell with said 
molecule, and wherein step (a) is carried out in an agar overiay assay, wherein said 
molecule is purified, wherein said molecule is not substantially purified, wherein said 
promoter comprises SEQ ID NO: 3 or a SEO ID NO: 3 homolog comprising one or more 
nucleotide substitutions, additions or deletions that do not effect the ability of the 
sequence to promote transcription of said operatively linked sequence thereof. 

Ashby teach methods of identifying genes whose expression is indicative of 
activation of a particular biochemical or metabolic pathway or a common set of 
biological reactions or functions in a cell ("regulon indicator genes"). Ashby et al teach 
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methods for identifying effectors (activators and inhibitors) of regulon target genes are 
provided. Ashby et a! teach genes that are regulated by regulon target genes of yeast or 
its mammalian homolog may be identified comprising the steps of a) overexpressing the 
target gene in host cells of a matrix comprising a plurality of units of cells, the cells in 
each unit containing a reporter gene operably linked to an expression control sequence 
derived from a gene of a selected organism; and b) identifying genes that are either 
induced or repressed by overexpression of the target gene. Ashby et al teach yeast 
cells respond by significantly up-regulating the genes encoding sterol biosynthetic 
enzymes and thus synthesizing more of the enzymes that make sterols and identifying 
genes that are involved in sterol biosynthesis or in related metabolic pathways by 
assays (see abstract claims, pgs. 1 and 15-22). 

Ashby et al teach an isolated protein or polypeptide that has been separated 
from naturally associated components that accompany it in its native state. Thus, a 
polypeptide that is chemically synthesized or synthesized in a cellular system different 
from the cell from which it naturally originates will be "isolated" from its naturally 
associated components. Ashby et al teach a protein may also be rendered substantially 
free of naturally associated components by isolation, using protein purification 
techniques well known in the art. Ashby et al teach a monomeric protein is "substantially 
pure, "substantially homogeneous" or "substantially purified" when at least about 60 to 
75% of a sample exhibits a single polypeptide sequence. Ashby et al teach a 
substantially pure protein will typically comprise about 60 to 90% W/W of a protein 
sample, more usually about 95%, and preferably will be over 99% pure. Ashby et al 
teach a protein purity or homogeneity may be indicated by a number of means well 
known in the art. such as polyacrylamide gel electrophoresis of a protein sample, 
followed by visualizing a single polypeptide band upon staining the gel with a stain well 
known in the art and for certain purposes, higher resolution may be provided by using 
HPLC or other means well known in the art for purification. Ashby et al teach nucleic 
acids of this invention include single-stranded and double-stranded DNA. RNA, 
oligonucleotides, antisense molecules, or hybrids thereof and may be isolated from 
biological sources or synthesized chemically or by recombinant DNA methodology. 
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Ashby et al teach that the nucleic acids, recombinant DNA molecules and vectors of this 
invention may be present in transformed or transfected cells, cell lysates, or in partially 
purified or substantially pure forms (see abstract claims, pgs. 1 and 15-22). 

Ashby et al teach S. cerevisiae proteins that have homology to a protein from 
another organism if the encoded amino acid sequence of the yeast protein has a similar 
sequence to the encoded amino acid sequence of a protein of a different of a different 
organism. Ashby et al teach a S. cerevisiae protein may have homology or be 
homologous to another S. cerevisiae protein if the two proteins have similar amino acid 
sequences (see abstract claims, pgs. 1 and 15-22). Thus teaching a method, wherein 
said molecule is purified, wherein said molecule is not substantially purified. 

Phillips teach a method in a pathway in S. cerevisiae as set forth supra, wherein 
step (a) comprises contacting said cell with said molecule, and wherein step (a) is 
carried out in a solid plate halo assay, wherein step (a) comprises contacting said cell 
with said molecule, and wherein step (a) is carried out in an agar overlay assay (see 
claims). 

Contreras et al. teach a sequence that is 98% identical to SEQ ID NO:3 thus 
teaching a YI\/IR325W sequence homolog comprising one or more nucleotide 
substitutions, additions or deletions that do not effect the ability of the sequence to 
promote regulated transcription of said target polynucleotide sequence thereof (see 
claim 1 figure 1 STIC results). 

It would have been prima facie obvious at the time the invention was made to 
incorporate a YMR325W sequence homolog comprising one or more nucleotide 
substitutions, additions or deletions that do not effect the ability of the sequence to 
promote regulated transcription of said target polynucleotide sequence thereof as taught 
by Contreras et al because Contreras et al teach protein and coding sequences of 
apoptosis associated proteins from the yeast Saccharomyces cerevisiae that can be 
used to identify treatments for yeast infections (see abstract). It would have been prima 
facie obvious at the time the invention was made to incorporate a method in a pathway 
in S. cerevisiae as set forth supra, wherein step (a) comprises contacting said cell with 
said molecule, and wherein step (a) is carried out in a solid plate halo assay, wherein 
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step (a) comprises contacting said cell with said molecule, and wherein step (a) is 
carried out in an agar overlay assay as taught by Phillips J. because both Phillips and 
Dixon et al teach the same method of determining the function, monitoring, and 
identifying a molecule in a biosynthesis pathway of S. cerevisiae. It would have been 
prima facie obvious at the time the invention was made to incorporate a method, 
wherein said molecule is purified, wherein said molecule is not substantially purified as 
taught by Ashby et al because both Ashby et al and Dixon et al teach the same method 
of determining the function, monitoring, and identifying a molecule in a biosynthesis 
pathway of S. cerevisiae. 

Status of the Claims 

Claims Ml, 13. 15, 17, 19-23, and 25-30 are rejected. 
No claims are allowed. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Nina A. Archie whose telephone number is 571-272- 
9938. The examiner can normally be reached on Monday-Friday 8:30-5:00p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner 
supervisor, Shanon Foley can be reached on 571-272-0898. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



Application/Control Number: 1 0/566,426 
Art Unit: 1645 

Nina Archie 
Examiner 
Art Unit 1645 

/Mark Navarro/ 

Primary Examiner, Art Unit 1645 



Page 16 

/Nina A Archie/ 
Examiner, Art Unit 1645 
/N. A. A./ 

Examiner, Art Unit 1645 



Notice of References Cited 


Application/Control No. 
10/566,426 


App!icant(s)/Patent Under 
Reexamination 
PHILLIPS. JOHN W, 


Examiner 

Nina A. Archie 


Art Unit 

1645 


Page 1 of 1 



U.S. PATENT DOCUMENTS 



* 




L/0(AJincni iNUlilUcI 

Country Code-Number-Kind Code 


MM-YYYY 


Name 


Classification 


* 


A 


US-6.828,092 


12-2004 


Dixon et al. 


435/6 


•k 


B 


US-7,022,481 


04-2006 


Phillips. John W. 


435/6 




C 


US- 










D 


US- 




• 






E 


US- 










F 


US- 










G 


US- 










H 


US- 










1 


us- 










J 


us- 










K 


us- 










L 


us- 










M 


US; 








FOREIGN PATENT DOCUMENTS 






Document Number 
Country Code-Number-Kind Code 


' Date 
MM-YYYY 


Country 


Name 


Classification 




N 


WO/2000/58521 


10-2000 


US 


Ashby et al 






0 


WO/2001/02250 


01-2001 


BE 


Contreras et al. 






P 














Q 














R 














8 














T 












NON-PATENT DOCUMENTS 


* 




Include as applicable: Author, Title Date, Publisher. Edition or Volume, Pertinent Pages) 




U 


Bowie et a! (Science, 1 990, 247: 1 306-1 31 0 




V 


• 




w 






X 





*A copy of this reference is not being furnished with this Office action. {See MPEP § 707.05(a).) 
Dates In MM-YYYY format are publication dates. Classifications may be US or foreign. 

U.S. Patent and Trademark Office 



PTO-892 (Rev. 01-2001 ) Notice of References Cited Part of Paper No. 20080617 




PATENT Case No. RS0212 



Applicants: 



THE UNITED STATES PATENT AND TRADEMARK OFFICE 

W. Phillips 



Serial No. 10/566.426 
Filed: January 30, 2006 

For: Methods for Using a Sterol Biosynthesis Pathway 

Reporter Gene to Screen for Antifungal or Lipid Lowering 
Compounds 



Art Unit: 



Examiner: To Be Assigned 



Commissioner for Patents 
P.O. Box 1450 

Alexandria. Virginia 22313-1450 



INFORMATION DISCLOSURE STATEMENT 

UNDER 37 CFR1.97 



Sir: 



1. In compliance with 37 C.F.R. 1.97, submitted on the attached form herewith is a list of patents, publications 



or other information which are requested to be made of record in this application. This Information Disclosure Statement 
is not an admission that any patent, publication or other information referred to herein is ''prior art'* for this invention. 
In accordance with 37 C.F.R. 1.97(h), the filing of this Information Disclosure Statement shall not be construed to be 
an admission that the information cited in the Statement is, or is considered to be, material to patentability as defined 
in 37 C.F.R. 1.56(b). 

2. In accordance with 37 C.F.R, 1 .97(g), the filing of this Information Disclosure Statement shall not be 
construed to mean that a search has been made. 

3. Applicants respectfully request that the Examiner initial the attached form after reviewing the pertinence of 
each reference. 

4. Pursuant to 37 C.F.R. 1.98 (a)(2)(ii), copies of each cited U.S. patent and each U.S. patent application 
publication are not enclosed herewith. 



I hereby certify that this correspondence is being 
deposited with the United States Postal Sen/ice as first 
class mail in an envelope addressed to: Commissioner for 
Patents. P.O. Box 1450. Alexandria, Virginia 22313-1450. 
on the date appearing below. 



ROSETTA INPHARMATICS LLC 



ALL REFERENCES CONSIDER^ 

Computer generated farm tOS Utter - Rosetta' (IDS Folder), Merck & Co.. inc.. 02/OS/200S 




ED I HK0UC5h. /N 



PATENT Case No. RS0212 



Page 2 of 2 



INFORMATION DISCLOSURE STATEMENT 

5. Pursuant to 37 C.F.R. 1.98(d), copies of references listed on the attached fornn that were submitted to or cited 
by the Office in a related application upon which the instant application relies for an earlier filing date under 35 U.S.C. 120 
are not enclosed. Related application(s) in which references were submitted to or cited by the Office are as follows: 



RELATED APPLICATION 


U. S. SERIAL NUMBER 


FILING DATE 


MERCK CASE 







































If this is inconvenient, additional copies will be submitted upon request. 



6. In accordance with 37 C.F.R. 1.97, (check one) 

the attached information is filed within three months of the filing date of the captioned case. 

the attached infomnation is filed more than three months after the filing date but prior to the mailing of a first 
Office Action on the merits. 



□ 



the attached information is being filed more than three months after the filing date and after the mailing of a first 
Office Action on the merits, but before the mailing date of a Final Action or Notice of Allowance. The enclosed 
authorization is therefore given to charge Deposit Account No. 13-2755 for the fee required under 37 C.F.R. 
1.17(p). 

the undersigned certifies that each item of information contained in this Information Disclosure Statement was first 
cited in any communication from a foreign patent office in a counterpart foreign application not more than three 
months prior to the filing of this Statement. 

the undersigned certifies that no item of information contained in this Information Disclosure Statement was cited in 
a communication from a foreign patent office in a counterpart foreign application, and, to the knowledge of the 
person signing the certification after making reasonable inquiry, was known to any individual designated under 37 
C.F.R. 1.56(c) more than three months prior to the filing of this Statement. 



Respectfully submitted, 




By: R. Douglas Bradley 
Attorney For Applicant(s) 

Reg. No. 44.553 



ROSETTA INPHARMATICS LLC 
401 Terry Avenue North 
Seattle. WA 98109 

(206) 802-6301 



ALL REFERENCES CONSIDERED EXOB^TiWMgREPKINED THROUGH. /NA/ 

Computer gensratad Ibrm lOS L«ttor - Rosatta" (IDS Fotder), Mardc & Co.. Inc. 02A)9/2005 



Approved fbr use dtretsli 7/3 1 A006l OMB 06SI-0Q91 

SUBSmVTE fbr PIO/SBAiSA (08-03), Infbnrniioa Doclosiire Sutcraem by Applkant 
Patem and Tndemark Ofltce; U.S DEPARTMEKT OF COMMERCE 



Substitute for form 144SA^T0 

INFORMATION DISCLOSURE 
STATEMENT BY APPLICANT 



(use as many sheets as necessary) 



Sheet 



of 



COMPLETE IF KNOWN 



Application Numtier 



Filing Date 



First Named Inventor 



Group Art Unit 



Examiner Name 



Attorney Docket Number 



10/566,426 



January 30, 2006 



Phillips 



To Be Assigned 



RS02I2 




U.S. PATENT DOCUMENTS 


Examiner 
initials 


Cite 

INO. 


U.S. Patent Document 


Name of Patentee or Applicant 
Oi citea uocument 


Date of Publication of 
Cited Document 
MM-DD-YYYY 


Number Code 

fifhtamm) 




Al 


5,569.588 


■ 


Ashby et al. 


10/29/1996 




A2 


5,777,888 




Rine et al. 


07/07/1998 




A3 


5,965,352 




Stoughton et al. 


10/12/1999 




A4 


6,057,101 




Nandabalan et al. 


05/02/2000 




A5 


6,064,754 




Parekh et al. 


05/16/2000 




A6 


6,083,693 




Nandabalan et al. 


07/04/2000 




A7 


6,165,709 




Friend et al. 


12/26/2000 




A8 


6,271,002 




Lindsley et al. 


08/07/2001 




A9 


6,324,479 




Friend et al. 


1 1/27/2001 







































































































































FOREIGN PATENT DOCUMENTS 


Examiner 
Initials* 


Cite 
No. 


Foreign Patent Document 


Name of Patentee or Applicant 
of Cited Document 


Date of Publication of 
Cited Document 
MM-DD-YYYY 


^. . Kind 
Office Number Code 

Of known) 




Bl 


PCX 


WO 98/38329 




Friend et al. 


09/03/1998 




B2 


PCX 


WO 00/58521 




Ashby et al. 


10/05/2000 


















































































































r Examiner 
Signature 


Date 

Considered J 



•Examiner Initial iTreference ccmsidered, whether or not ciati(m is in c« Draw line through atation lYncM m confoniunce and not considered, hicludecofjy 



• I 



Substitute for form 1 449B/PT0 

INFORMATION DISCLOSURE 
STATEMENT BY APPLICANT 



(use as many sheets as necessary) 



Sheet 



of 



Appnived for use through 7/3 1/2006. OMB 0631-0031 

SUBSnrUTB fbr PTO/SB/OSA (0MI3K lofoniatkui Dtsdosure Statement by Applknni 
Pkiott ud Traifcmuk Oflke; U.S DEPARTMEhTT OF COMMERCE 



COMPLETE IF KNOWN 



Application Number 



Filing Date 



First Named Inventor 



Group Art Unit 



Examiner Name 



Attorney Docket Number 



10/566,426 



January 30, 2006 



Phillips 



To Be Assigned 



RS0212 




NON PATENT LITERATURE DOCUMENTS 


Examiner 
Initials* 


Cilc 
No. 


Include name of the author, title, date, page(s), volume- issue numl)er(s) and place of publication. 




CI 


Alam J;Cook JL; "Reporter genes: application to the study of mammalian gene transcription" 1990, Anal 
Biochem 1 88(2):24S-2S4 






Bachmair F;Huber CG;Daxenbichler G; "Quantitation of gene expression by RT-PCR and HPLC analysis of 
PCR products" 2002, Methods Mol Biol 193:103-116 






Balkis MM;Leidich SD;Mukherjee PK;Ghannoum MA; "Mechanisms of fungal resistance: an overview" 
2002, Drugs 62(7): 1025-1 040 






Blanchard AP;Hood L; "Sequence to array: probing the genome's secrets" 1996, Nat Biotechnol 
14(13):1649 






Blondelle SE;Houghten RA; "Novel antimicrobial compounds identified using synthetic combinatorial 
library technology" 1 996, Trends Biotechnol 1 4(2):60-65 






Bojase G;Majinda RR;Gashe BA;Wanjala CC; "Antimicrobial flavonoids from Bolusanthus speciosus" 
2002, Planta Med 68(7):61 5*620 






Bussey H;Kaback DB;Zhong W;Vo DT;Clark MW;Fortin N;Hall J;Ouellette BF;Keng T;Barton AB; "The 
nucleotiae sequence of chromosome I lirom Saccharomyces cerevisiae 1995, Proc Natl Acad Set USA 
92(9):3809-3813 




Co 


Cubitt AB;Heim R;Adams SR;Boyd AE;Gross LA;Tsien RY: "Understanding, improving and using green 
fluorescent proteins" 1995, Trends Biochem Sci 20(1 1):448-455 




C9 


Dainese P;Staudenmann W;Quadroni M;Korostensky C;Gonnet G;Kertesz M;James P; "Probing protein 
function using a combination of gene knockout and proteome analysis by mass spectrometry" 1997, 
Electrophoresis 1 8(38780):432^42 




CIO 


Dimster-Denk D;Rine J;Phillips J;Scherer S;CundifrP;DeBord K;GilliIand D;Hickman S;Jarvis A;Tong 
LiAshbv M: *'ComDrehensive evaluation of isoorenoid biosvnthesis re&ulation in Saccharomvces cerevisiae 
utilizing the Genome Reporter Matrix" 1999, J Lipid Res 40(5):850*860 




Cll 


Dujon B;Alexandraki D;Andre B;Ansorge W;BaIadron V;Ballesta JP;Banrevi A;Bolle PA;Bolotin-Fukuhara 
M;Bossier P;.; "Complete DNA sequence of yeast chromosome XI" 1994, Nature 369(6479):37 1-378 




C12 


Feldmann H;Aigle M;Aljinovic G;Andre B;Baclet MC;Barthe C;Baur A;Becam AM;Biteau N;BoIes E;.; 
"Complete DNA sequence of yeast chromosome IT" 1994, EMBO J 13(24):5795-5809 




C13 


Galibcrt F;Alexandraki D;Baur A;Boles E;Cha!wauis N;Chuat JC;Costcr F;Czicpluch C;De Haan M;Domdey H;Durand P;Entian 
KD;Gatius M;Gofrcau A;Grivcll LA;Henncmaiin A;Herbcrt CJ;Hcumann K;HiIgcr F;Hollenberg CP;Huang ME;Jacq C;Jauniaux 
JC .Katsoulou C;Karprmg **CompIete nucleotide sequence of Saccharomyces cerevisiae chromosome X" 1996, EMBO J 
15(9):203 1-2049 




C14 


Gorman JA;Chang LP;Clark J;Gustavson DR;Lam KS;Mamber SW;Pimik D;Ricca C;Femandes 
PB;0'Sullivan J; "Ascosteroside, a new antifungal agent from Ascotricha amphitricha. I. Taxonomy, 
fermentation and biological activities" 1996, J Antibiot (Tokyo) 49(6):547-552 




C15 


Heller RA;Schena M;Chai A;Shalon D;Bedilion T;Gilmore J;Woolley DE;Davis RW; "Discovery and 
analysis of inflammatory disease-related genes using cDNA microarrays" 1997, Proc Natl Acad Sci U S A 
94(6):2 150-2 155 




*Exaintiicr initialtfreferaiceconsidered, whether or not ciution is in coafen^^ Draw line dirough dUticM) if not in coafornunce and act consM^^ Include copy 



• 



INFORMATION DISCLOSURE 
STATEMENT BY APPLICANT 



. Approvtd (brtxsetlirOQ^ 7/31 A006. 0MB 0631-0031 

SUBSTITUTB Ihr PTQ/SB/08A (08-03>, Infimoatiori Dsdosure Statanenl by Applieaitt 
PWcm and Tratenut Office; DEPARTMENT OF COMMERCE 



(use as many sheets as necessary) 



Sheet 



of 



COMPLETE IF KNOWN 



Application Number 



Filing Date 



First Named Inventor 



Group Art Unit 



Examiner Name 



Attorney Docket Number 



10/56M26 



January 30, 2006 



Phillips 



To Be Assigne 



RS0212 




NON PATENT LITERATURE DOCUMENTS 


Examiner 
Iniiials* 


Cite 
No. 


Include name of the author, title, date, page(sX volume-issue numbei^s) and place of publication. 




|U10 


Humphery-Smith I;Cordwell SJ;Blackstock WP; "Proteome research: complementarity and limitations with 
respect to the RNA and DNA worlds" 1 997, Electrophoresis 1 8(8): 1 2 1 7-1242 






Johnston M;Andrews S;Brinkman R;Cooper J;Ding H;Dover J;Du Z;Favello A;Fulton L;Gattung S;.; 
i^ompieie nucieotiae sequence oi oaccnaromyces cerevisiae chromosonie Vili 1994, science 
265(51 8 1):2077-2082 




PI R 


Lashkari DA;DeRisi JL;McCusker JH;Naniath AF;Gentile C;Hwang SY;Brown PO;Davi5 RW; "Yeast 
inicroarrays lor genome wiae parallel genetic ana gene expression analysis lyy/, rroc Natl Acau oci u 2i A 
94(24):13057-13062 


* 




Liang P;Pardee AB; "Differential display of eukaryotic messenger RNA by means of the polymerase chain 
reaction" 1992, Science 257(5072):967-971 






Lockhart DJ;Dong H;Byme MC;Follettie MT;Gallo MVjChee MS;Mittmann MjWang C;Kobayashi 
ivi,rionon ri,i>rown t^L., t::.xpression monitoring oy nyDnaization to nign-density oligonucleotide arrays 
1996, Nat Biotechnol 14(1 3): 1675-1680 






Martel RR;Rounseville MP;Botros IW;Seligmann BE; "Multiplexed Chemi luminescent Assays in 
Arrayriates tor tiign* i norougnput Measurement or oene expression 2002, Proceedings ot SPI£ 
4626:35^3 






McCormack PJ;WiIdman HG;Jefnries P; "Production of antibacterial compounds by phylloplane-inhabiting 
yeasts and yeastlike fungi" 1994, Appl Environ Microbiol 60(3):927-93 1 

* 






Muller PY;Janovjak H;M]serez AR;Dobbie Z; "Processing of gene expression data generated by quantitative 
real-time RT-PCR" 2002, Biotechniques 32(6): 1372-1 379 


- 




Rahalison L;Hamburger M;Hostettmann KiMonod M:Frenk E; "A Bioautographic Agar Overlay Method for 
the Detection of Antifungal Compounds from Higher Plants" 1991, Phytochemical Analysis 2:199-203 




C25 


Rho K; Jeong H;Kahng B; "Identification of essential and functionally moduled genes through the microarray 
assay" 2003, 1-21 http://arxiv.org/abs/cond-mat^301 110 




C26 


Rios JL;Recio MC;VilIar A; "Screening methods for natural products with antimicrobial activity: a review of 
the literature" 1988, J Ethnopharmacol 23(38751):127.149 




C27 


Rose M;Botstein D; "Construction and use of gene fusions to lacZ (beta-galactosidase) that are expressed in 
yeast" 1983, Methods Enzymol 101:167-180 




C28 


RothsteinRJ; "One-step gene disruption in yeast" 1983, Methods Enzymol 101:202-211 




C29 


Schena M;Shalon D;Davis RW;Brown PO; "Quantitative monitoring of gene expression patterns with a 
complementary DNA microarray" 1995, Science 270(5235):467-470 




C30 


Sternberg S; "The emerging fungal threat" 1994, Science 266(5 191): 1632- 1634 




Include copy 



*Examiner: Inidal if reference consideied. whether or not citation is in confonnance with MPEP 609. Draw Kne through citatum if not in eonfbnnance and not 



Approved (brnselhroti^h 7/3 1 AOOS. OMB Q65I.O(UI 
SUBSTITUTE Ibr FTO/SB/08A (0S-O3X Infimmiiofl Doclosure Siatencm by Applkaot 
P»teni ud Tfmdenarfc OfiRce; U.S DEPARTMENT OF COMMERCE 



Substituta for form 1449B/PTO 

INFORMATION DISCLOSURE 
STATEMENT BY APPLICANT 



(use as many sheets as necessary) 



Sheet 



of 



COMPLETE IF KNOWN 



Application Number 



Filiog Date 



First Named Inventor 



Group Alt Unit 



Examiner Name 



Attorney Docket Number 



10/566,426 



January 30, 2006 



Phillips 



To Be Assigned 



RS0212 




NON PATENT LITERATURE DOCUMENTS 


Examiner 
Initials* 


Cite 
No. 


Include name of the author^ title, date, page(s), volume-issue numt>er(s) and place of publication. 




C31 


Velculescu VE;Zhang L;Vogelstein B;KinzIer KW; "Serial analysis of gene expression" 1995, Science 
270(5235):484^87 




C32 


Wane ZiBrown DD: "A eene exoression screen" 1991. Proc Natl Acad Sci USA 88(24): 1 1 505-1 1509 




C33 


Wodicka L;Dong H;Mittniann M;Ho MH;Lockhart DJ; "Genome-wide expression monitoring in 
Saccharomyces cerevisiae" 1997, Nat Biotechnol 15(13):1359-1367 












* 

• 












• 












V 






• 
























* 









c 



Examiner 
Signature 



/Nina Archie/ 



Date 

Considered 



06/17/2008 



•Examiner. InidaJ if refigcnce considered, whether or not cttatiOD is in ooufbrrnance wfth MPEP 609. Draw line through cftation if not in confbnnance and not comide Include cof^ 



SUBS 11 1 U I'E for PTO/SBAMA (04-07), Nofommioo Dtsctesure Siattniem by AppGcus* 
Patent and Tradexnut Office; U.S DEPARTMENT OF COMMERCE 



Substitute fbr form 1449B/PTO 

INFORMATION DISCLOSURE 



BY APPLICANT 




sheets as necessary) 



COMPLETE IF KNOWN 



Application Number 



Filing Date 



First Named Inventor 



Group Art Unit 



Examiner Name 



10/566.426 



Januaiy 30, 2006 



Phillips 



To Be Assigned 



of 



Attorney Docket Number 



RS0212 



NGN PATENT LITERATURE DOCUMENTS 


Examiner 
Initials* 


Cite 
No. 


Include name of the author, title, date, page(s), volume-issue number(s) and place of publication. 


/NI A/ 




International Search Reoort for PCTAJS2004/24034 Dated Januarv 13 2005 ('^ names') 


/N.Ai 




Written Ooinion of the International Searching Authority for PCT/US2004/24034 d naoes'l 




















• 














































* 















c 



Examiner 
Signature 



/Nina Archie/ 



Date 

Considered 



06/17/2008 



^xraimer. faiidal if refereBce considered, whether or not citation is m confonnancc with MPEP 609. Draw line through citation if not in confbnnancc and not considered. Inchide copy of 
this tonn with next communicadon to ajqilicant 

SENDTO: Commissioner for Patents, P. O. Box 1450. Alexandria. VA 22313-1450. coc^,^u^-ws,^.»^- ,mf^^*<:^>^ <»mn»n 



PCX 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 

International Bureau 



(51) International Patent Classification 7 : 




(11) Internationa! Publication Number: 


WO 00/58521 


C12Q 1/68 


A2 










(43) International Publication Date: 


5 October 2000 (05.10.00) 



^1) International Application Number: PCn7USO0/086O4 
(22) International FUing Date: 3 1 March 2000 (3 1 .03.00) 



(30) Priority Data: 
60/127.223 



31 March 1999 (31.03.99) 



US 



(71) Applicant {for all designated States except US): ROSETTA 

INPHARMATICS, INC. [VSOJS]; 12040 115th Ave. N.E.. 
Kirkland. WA 98304 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): ASHBY, Matthew 
[US/US]; 91 Longfellow Rd.. Mill Valley, CA 94941 (US). 
SCHERER, Stewart [US/US]; 3938 Paseo Grande. Moraga, 
CA 94556 (US). PHILLIPS. John [US/US]; 7363 N.E. 
112th St, Kirkland, WA 98034 (US). ZIMAN. Michael 
[US/US]; 3615 Whitman Ave. N., #302, Seattle. WA 
98103 (US). MARINl, Nicholas [US/US]; 60 Fountain St, 
San Francisco, CA 941 14 (US).. 

(74) Agents: HALEY, James, F., Jr. et ah; Fish & Neave, 1251 
Avenue of the Americas, New York, NY 10020 (US). 



(81) Designated States: AE, AG, AL, AM, AT, AU. A2, BA, BB, 
BG, BR. BY, CA, CH, CN, CR, CU, CZ, DE, OK, DM, 
D2, EE, ES, FI. GB, GD, GE, GH, GM, HR, HU, ID, IL. 
IN, IS, JP. KE. KG. KP, KR. KZ. LC, LK. LR, LS. LT, LU. 
LV, MA. MD, MG. MK, MN. MW, MX. NO, NZ. PL, FT. 
RO, RU. SD. SE, SG, SI, SK. SL. TJ, TM, TR. TT, TZ, 
UA, UG, US, UZ, VN, YU, ZA, ZW. ARIPO patent (GH, 
GM. KE. LS, MW, SD. SL. SZ. TZ. UG. ZW). Eurasian 
patent (AM. AZ, BY, KG, KZ, MD. RU. TJ, TM), European 
patent (AT. BE. CH. CY, DE. DK. ES. FI. FR, GB, GR, 
IE. IT. LU. MC. NL. PT. SE). OAPI patent (BF, BJ, CF, 
CG, CI, CM, GA, GN. GW, ML. MR. NE. SN. TD, TG). 



Published 

Without international search report and to be republished 
tq>on receipt of that report. 



(54) Title: METHODS FOR THE IDENTIFICATION OF REPORTER AND TARGET MOLECULES USING COMPREHENSIVE GENE 
EXPRESSION PROFILES 

(57) Abstract 

The present invention relates to methods of identifying genes whose expression is indicative of activation of a particular biochemical 
or metabolic pathway or a common set of biological reactions or functions in a cell ('Vegulon indicator genes"). The present invention 
provides an example of such an indicator gene. The present invention also relates to methods of partially charactwizing a gene of unknown 
function by determming which biological pathways, reactions or functions its expression is associated with, thereby placing the gene within a 
ftinctional genetic group or "regulon". These partially characterized genes may be used to identify desirable therapeutic targets of biological 
pathways of mtercst ("regulon target genes"). The present invention provides examples of such target gcnesi Methods for identi^ing 
effectore (activators and mhibitors) of regulon target genes are provided. Hie present invention also provides examples of regulon tsrget 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Fmland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


SZ 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


. TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burtina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


EG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Toba^ 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Braul 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Tceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekbtan 


CF 


Central African Republic 


JP 


J^>an 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


C6tc d'lvoire 


KP 


Democratic People's 


.NZ 


New Zealand 






CM 


Cameroon 




Rqiublic of Korea 


PL 


Poland 






CN 


China 


KR 


Rqwblic of Korea 


PT 


Portt^l 






cu 


Cuba 


KZ 


Kazalcstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia' 


RU 


Russian Federation 






DE 


Germany 


U 


Liechtenstein 


SD 


Sudan 






OK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Sing^iore 







wo 00/58521 



PCT/USOO/08604 



Methods for the Identiflcation of Reporter and Target Molecules Using 

Comprehensive Gene Expression Profiles 

TECHNICAL FIELD OF THE INVENTION 
The present invention relates to methods of identifying genes whose 
5 expression is indicative of activation of a particular biochemical or metabolic pathway 
or a conmion set of biological reactions or functions in a cell ("regulon indicator 
genes") The present invention provides an example of such an indicator gene. The 
present invention also relates to methods of partially characterizing a gene of unknown 
function by determining which biological pathways, reactions or functions its 
10 expression is associated with, thereby placing the gene within a functional genetic 
group or "regulon". These partially characterized genes may be used to identify 
desirable therapeutic targets of biological pathways of interest ("regulon target 
genes"). The present invention provides examples of such target genes. Methods for 
identifying effectors (activators and inhibitors) of regulon target genes are provided. 
1 5 The present invention also provides examples of regulon target gene inhibitors. 

BACKGROUND OF THE INVEimON 
The sequencing of the S. cerevisiae genome marked the first complete, 
ordered set of genes fi'om a eukaryotic organism, and revealed the presence of over 
6,000 genes on 16 chromosomes (Mewes et al., 1997, GofiFeau et al., 1996). The 
20 DNA sequence revealed the presence of 6275 known and hypothetical open reading 

frames (ORFs) encoding putative proteins longer than 99 amino acids in length. Based 
upon codon usage^ which can serve as a predictor of whether or not an ORF is actually 
expressed, there are currently thought to be 6222 expressed ORFs (Cherry et al., 
1997). 
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The sequence of the roughly 6,000 ORFs in the yeast genome is 
compiled in the Saccharomyces Genome Database (SGD). The SOD provides Internet 
access to the complete genomic sequence ofS. cerevisiae, ORFs, and the putative 
polypeptides encoded by these ORFs. The SGD can be accessed via the World Wide 
5 Web at http .//genome- v^ww. Stanford. edu/Saccharomyces/ and 

http://vmw.mips.biochem.mpg.de/mips/yeast/. A gazetteer and genetic and physical 
maps of .V, cerevisiae is found in Mev\^es et al., 1997 (incorporated herein by 
reference). References therein also contain the sequence of each chromosome of X 
cerevisiae (incorporated herein by reference). 

^0 Having the complete DNA sequence of yeast available creates an 

opportunity to take a coUectivist, rather than a reductionist, view on biology. We have 
developed a new technology that enables the simultaneous measurement of gene 
expression across an entire genome. The Genome Reporter Matrix™ (GRM) is a 
matrix of units comprising living yeast cells, the cells in each unit containing one yeast 

1 5 reporter fusion (GRM construct) representative of essentially every known 

hypothetical ORF of cerevisiae. See U.S. Pat. Nos. 5,569,588 and 5,777,888. A 
GRM construct comprises the promoter, 5* upstream untranslated region and usually 
the first four amino acids from one of each hypothetical ORF fused to a gene encoding 
an easily assayed reporter, such as green fluorescent protein (GFP), luciferin, or P- 

20 * galactosidase. For a few GRM constructs, one to ten of the first amino acids from a 
hypothetical ORF is fused to the reporter. In addition, for those ORFs that have an 
intron, the entire first exon and the usually first four amino acids of the second exon 
are fused to the reporter. The GRM constructs are able to reveal changes in 
transcription for each hypothetical ORF in response to specific stimuli. In addition, the 

25 GRM constructs are able to reveal changes.in mRNA splicing, translation and protein 
stability in those cases in which the N-terminus of the protein is sufficient for 
regulation. 

The GRM provides an unprecedented view into the compensatory 
changes a cell makes in the face of a changing environment. Such environmental 
30 changes may be in the form of pH, salinity, temperature, osmotic pressure, nutrient 
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availability, as well as biochemical perturbations caused by xenobiotics, phannaceutical 
compounds and mutation. Identifying the compensatory changes a cell makes in 
response to expoaire to a chemical can provide insight into the biological target of the 
chemical. For example, treatment of the GRM with the cholesterol-lowering drug 
5 lovastatin causes the cells to become depleted for sterols and non-sterol isoprenoids. 
The yeast cells respond by significantly up-regulating the genes encoding sterol 
biosynthetic enzymes and thus synthesizing more of the enzymes that make sterols. 
One may identify those genes that are involved in sterol biosynthesis or in related 
metabolic pathways by assaying the GRM. Because natural selection operates on a 

10 selected outcome rather than on a particular molecular mechanism, gene expression 
profiling strategies that detect regulatory changes through several molecular 
mechanisms contribute to a fuller view of how regulatory circuits have evolved. 

An understanding of the regulatory circuits of yeast serves two 
purposes. On the one hand, yeast is an ideal model system for eukaiyotic cells, 

1 5 including mammalian cells. Therefore, an understanding of the metabolic pathways of 
yeast can be used to design or discover drugs for use in plants and animals, including 
humans. On the other hand, yeast possess certain metabolic pathways and genes which 
are unique to yeast. An understanding of the differences between yeast and higher 
eukaryotes will permit the design and discovery of antifungal drugs that target genes 

20 and metabolic pathways specific to yeast. See U.S. Serial No. 60/127,272, filed 
concurrently herewith. 

Yeast cells are eukaryotic and have many pathways that are similar or 
identical to those of mammalian cells. However, because yeast cells are unicellular, 
they are easier to manipulate experimentaUy and the results of such manipulations are 

25 easier to determine. Thus, yeast serves as an ideal model system for eukaryotic cells, 
mcluding mammalian cells. The deduced protein sequences of the yeast genome 
display a significant amount of sequence identity with mammalian proteins. About 
one-third of the yeast ORFs, when aUgned with their mammalian counterparts, produce 
a P-value score of less than 1 x lO'^^ (Botstein et al., 1997). This number may in fact 

30 be a significant underestimate because the alignments were done with GenBank entries 



3 
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that make up only about 10-20% of the unique human protein sequences thought to 
exist. 

The evolutionary conservation between yeast and humans is not limited 
to sequence identity. The list of human genes that can functionally substitute for their 
yeast counterparts is extensive. For example, H-Ras (Kataoka et al., 1985), HMG- 
CoA reductase (Basson et al., 1988) and the heme A:famesyltransferase (Glerum and 
Tzagoloff, 1994) have been shown to functionally replace their yeast counterparts. 
Researchers have utihzed this evolutionary conservation to clone mammalian genes 
through their ability to complement the corresponding yeast mutants. Two examples 
include CDC2 (Lee and Nurse, 1987) and CDK2 (EUedge and Spottswood, 1991). 

Functional conservation between yeast and humans may be best 
illustrated by the notable lack of antifungal therapeutic agents available for safely 
treating systemic infections in humans. Antifungal agents certainly exist, but they are 
characterized by profound side effects likely caused by inhibition of the mammalian 
counterparts of the yeast target. L659,699, lovastatin, and zaragozic acid inhibit 
different steps in the yeast sterol pathway (HMG-CoA synthase, HMG-CoA reductase, 
and squalene synthase, respectively). These inhibitors are also potent inhibitors of the 
corresponding mammalian enzymes (Correll and Edwards, 1994). In addition, we have 
found that in experiments with over 100 pharmaceutical agents used to treat a variety 
of distinct cHnical indications in mammals, approximately 80% produced significant 
changes in gene expression in the GRM, indicating that there is substantial overlap in 
drug specificity between mammalian and yeast systems. 

Yeast also contain genes that encode proteins that do not have plant 
and/or animal homologs. These non-homologous genes may be used as targets for the 
design and discovery of highly specific antifungal agents for use in plants and animals, 
including humans. The GRM may be used to identify genes that are expressed in 
particular metabolic pathways. Non-homologous genes m a pathway of interest may 
be used as targets for design and discovery of antifungal agents, for instance. See, 
e.g., U.S. Serial No. 60/127,272, filed concurrently herewith. 

One metabolic pathway of interest for identification of both 
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homologous and non-homologous genes is the pathway for synthesis of isoprenoids. 
Eukaryotic cells utilize a group of structuraUy related compounds, the isoprenoids, for 
a vast array of cellular processes. These processes include structural composition of 
the lipid bilayer, electron transport during respiration, protein glycosylation, tRNA 
modification, and protein prenylation. All isoprenoids are synthesized via a pathway 
known variously as the isoprtaioid pathway, mevalonate pathway, or sterol 
biosynthetic pathway. Although the bulk end product of the pathway is sterols, there 
are several branches of the pathway that lead to non-sterol isoprenoids. Due to the 
involvement of isoprenoids in a variety of physiologically and medically important 
processes, a comprehensive understanding of the regulation of this pathway would 
offer many scientific and practical benefits. 

The regulation of the isoprenoid biosynthetic pathway is known to be 
complex in all eukaryotic organisms examined, including S. cerevisiae. The overriding 
principle for the regulation of this pathway is multiple levels of feedback inhibition. 
15 This feedback regulation is keyed to multiple intermediates and appears to act at 
numerous steps of the pathway, involving changes in transcription, translation and 
protein stabihty. Additionally, the availabUity of molecular oxygen, required for sterol 
and heme biosynthesis, also regulates the expression of genes at key steps of the 
pathway. The emerging picture is that the isoprenoid pathway has numerous points of 
20 regulation that act to control overall flux through the pathway as well as the relative 
flux through various branches of the pathway. 

Given the complexity of the isoprenoid pathway, it can be difficult to 
understand the regulation of any one step of this pathway, unless it is viewed within 
the context of the entire pathway. Thus, the GRM is ideal for understanding the 
regulation of the isoprenoid pathway because one may observe the regulation of all the 
yeast genes involved in the isoprenoid pathway at one time by using the GRM. In 
addition, analysis of the gene expression provided by the GRM (preferably using 
software described below) may provide information about which particular genes in the 
isoprenoid pathway are important regulatory genes in the pathway, those which are 
30 important indicator genes of the isoprenoid pathway, and those which are suitable 
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targets to regulate isoprenoid synthesis. 

Today we have the luxury of reflecting upon the wealth of information 
that has come from decades of research into the cell biology and genetics of yeast. 
Still, less than 20% of the hypothetical ORFs discovered by the yeast genome project 
5 had been previously identified through basic research (Goflfeau et al., 1996). 

Additionally, 25% of the yeast ORFs with obvious human homologs have no known 
function (Botstein et al, 1997). The situation will likely be the same when the human 
genome sequence is completed. 

Several research groups have created software programs that enable the 

10 comparison of both chemical and genetic expression profiles to identify related gene 
expression response patterns, as shown, for example, in Figure 38. In addition, 
expression changes of individual genes in response to any given treatment can often be 
accessed through hypertext links. Currently, our software will: 1) normalize 
expression data; 2) rank changes in individual gene's expression relative to a particular 

15 treatment; 3) rank similarities between genomic expression profiles as a resuh of a 
chemical or genetic treatment; and 4) determine the correlation coefficient for an 
individual gene's expression relative to that of all other genes to identify regulons, or 
groups of genes that share the same regulatory programs. See United States 
Application 09/076,668, now pending; Eisen et al. (1998); and Taitiayo et al. (1999). 

20 The ability to assign ORFs to functional groups based upon their 

expression patterns will provide valuable information pertaining to the function of 
proteins from model organisms as well as their mammalian counterparts. Analysis of 
genomic expression patterns may also reveal upstream regulatory sequences, including 
promoters, with great utility for regulated or constituitive expression of recombinant 

25 genes. Such regulated sequences can be used for making reporter constructs for any 
selected process intrinsic to a given genome. 

These functional genomics studies will provide a great deal of 
information that can implicate yeast genes, as well as their mammalian counterparts, in 
a variety of cellular functions. Associations of particular genes with specific biological 

30 pathways will be made by virtue of the genes' patterns of regulation under numerous 
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One particular problem in the prior art has been identifying genes whose - 
expression is representative of a specific biological (e.g., metabolic) pathway. One 
would like to be able to measure the expression of a gene or its encoded protein to 
S indicate the effect of a particular treatment on a specific pathway. Thus, there is a 
need for various pathway indicator genes for the various metabolic pathways. 

A second problem in the prior art has been identifying genes and their 
encoded proteins which can be efficient targets within a specific biochemical pathway 
or set of associated pathways. Once good targets have been identified, pharmaceutical 
10 compounds and treatments may be designed or discovered to regulate the expression 
or activity of the target gene or protein. 

SUMMARY OF THE INVENTION 
The instant invention addresses the above problems by providing a 
method using genomic arrays, such as the GRM or hybridization arrays, for identifying 
1 5 indicator genes that are specific for particular biochemical pathways and sensitive to 

perturbations of these pathways. The instant invention provides one such gene, HESl, 
which is an indicator for the isoprenoid metabolic pathway. The invention provides the 
polynucleotide sequence of HESl and vectors and host cells comprising this sequence. 
The invention also provides a method of producing HESl recombinantly. The 
20 invention further provides methods of using HESl as a specific indicator of the state of 
the isoprenoid pathway to identify compounds that regulate that pathway. 

The instant invention also provides a method for identifying targets for 
one or more biochemical pathways of interest using the GRM or other types of 
genomic arrays, such as hybridization arrays. The instant invention also provides a 
25 number of ORFs and their encoded proteins which are targets for lipid metabolism, 
yeast morphology, RNA metabolism and growth control. These ORFs include 
YMRJ34W, YER034W, YJLlOSw, YKL077w, YGR046w, YJR041c, YER044c and 
YLRJOOw and their encoded, proteins. 

The invention provides the polynucleotide sequences of these ORFs and 
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vectors and host cells comprising these ORFs for use in methods of identifying, 
designing and discovering highly specific anti-target agents. Specific anti-target agents 
include antisense nucleic acid molecules that target YMR134w, YER034w, YJLlOSw, 
YKL077W, YGR046W, YJR041c, YER044c and YLRlOOw and ribozymes that cleave 
5 RNAs encoded by these ORFs. The invention also provides a methods of 

recombinantly producing the protein encoded by these ORFs for use as a target in 
methods of identifying, designing and discovering highly specific antifiingal agents and 
for producing antibodies directed against the encoded protein. Specific anti-target 
agents include antibodies that bind to the protein encoded by YMRJ34w, YER034w, 
10 YJLJ05W, YKL077W, YGR046w, YJR04Jc, YER044c and YLRlOOw and small organic 
molecules that bind to and inhibit proteins encoded by these ORFs. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1. Summary of Characteristics for YJLlOSw. 
Figure 2. Plot of changes in expression of YJLIOSw and CYB5 in 
15 response to dififerent chemical treatments. Each point represents the expression 

changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. CYB5 functions in sterol 
biosynthesis through its activation of the Ergl Ip NADPH-cytochrome P-450 
reductase. 

20 Figure 3. Regulated Expression of K/Z/05w is significantly 

induced by isoprenoid biosynthetic inhibitors and mutations in HMG-CoA synthase 
(hmgs). "Log Ratio" refers to the natural log ratio of treated/untreated expression 
values. 

Figure 4. EfiFectsofloyastatinon wild-type and YJLIOSw knockout 
25 yeast strains. 10 \il of a 25 mg/ml solution of lovastatin (250 fig) in ethanol was 

applied to a sterile drug disk on a lawn of yeast (5x10^ cells, AB Y363). The plates 
were incubated overnight at 30°C. 

Figures. Summary of Characteristics for KMR/i^w. 

Figure 6. Plot of changes in expression of YMR134w and ERG2 in 

8 
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response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. ERG2 encodes sterol isomerase. 

Figure 7. Treatments Causing Highest Expression of YMRJ34w, 
5 YMR134W is induced most significantly by inhibitors of the isprenoid biosynthetic 
pathway. 

* 

Figures. Database Searches with Mkff?73-/>v. Database searches with 
YMRJ34W did not reveal any apparent mammalian counterparts. 

Figure 9. Summary of Characteristics for YER044c, 
10 Figure 10. Plot of changes in expression of YER044c and ERG2 in 

response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. 

Figure 11. Treatments Causing Highest Expression of YER044c. 
15 YER044C is induced most significantly by inhibitors of the isprenoid biosynthetic 
pathway. 

Figure 12. Database Searches with YER044c. Database searches with 
YER044C reveal numerous mammalian expressed-sequence tag (EST) apparent 
counterparts. 

20 Figure 13. Comparison of the YER044c Predicted Protein Sequence 

with Mouse and Human EST Translations. 

Figure 14. Comparison of the YER044c Predicted Protein Sequence 
with Rat EST Translation. 

Figure 15. Summary of Characteristics for YLRlOOw. 
25 Figure 16. Plot of changes in expression of YLRlOOw and CYB5 in 

response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. 

Figure 17. Treatments Causing Highest Expression YLRlOOw. 
30 YLRl OOw is induced most significantly by inhibitors of isprenoid biosynthesis and a 
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mutation in the gene encoding Ergl Ip. 

Figure 18. Database Searches with YLRlOOw. Database searches with 
YLRlOOw reveal numerous mammalian expressed-sequence tag (EST) apparent 
counterparts. 

5 Figure 19. Alignment of YLRlOOw to Manraialian ESTs. 

Figure 20. Summary of Characteristics for YER034w, 
Figure 21. Plot of changes in expression of YER034w and GPA2 in 
response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
10 indication of the level of coordinate gene expression. Gpa2p, encoded by GPA2, is the 
alpha subunit of a trimer G-protein involved in pseudohyphal growth. 

Figure 22. Mutation of the YER034w Gene Leads to Increased 
Pseudohyphal Growth. Cells were plated onto low nitrogen plates (0.5% agarose, 2% 
glucose, 0.34% yeast nitrogen base without amino acids and ammonium sulfate, 
15 O.OSmM ammonium sulfate, 20 jig/ml uracil, 30 jig/ml leucine, and 5 ng/ml histidine) 

and incubated for four days at 25*'C. Bar height represents the average number of 
hyphal projections per colony (n=20). 

Figure 23. Summary of Characteristics for YKL077w. 
Figure 24. Plot of changes in expression of YKL077w and SGVl in 
20 response to different chemical treatments. Each point represents the expression 

changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. SGVl is a Cdc28p-related 
protein kinase that is essential for yeast viability. 

Figure 25. Expression Correlation of YKL077w. Expression of the 
25 YKL077W gene correlates with that of genes involved in cell wall integrity and 

C3^oskeletal reorganization. 

Figure 26. Database Searches with YKL077w, Database searches with 
YKL077W did not reveal any apparent mammalian counterparts. 

Figure 27. Sunmiary of Characteristics for YGR046w, 
30 Figure 28. Plot of changes in expression of YGR046w and IRA2 in 
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response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. JRA2 encodes a GTPase- 
activating protein for Raslp and Ras2p. 
5 Figure 29. Expression Correlation of 7GR0^6w, Expression of the 

YGR046W gene is correlated to other genes involved in growth control 

Figure 30. Treatments Causing the Most Significant Changes in 
Expression of YGR046w. Expression of YGR046w is sensitive to agents that perturb 
mitrochondrial fiinaion, create oxidative stress and disrupt the cytoskeleton. 
1 0 Figure 3 1 . Summary of Characteristics for YJR04 Ic, 

Figure 32. Plot of changes in expression of YJR041c and MED7 in 
response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. MED7 is a component of the 
15 mediator complex involved in RNA Polymerase II transcription. 

Figure 33. Expression Correlation of YJR041c. Expression of 
YJR041C is correlated to genes involved in RNA metabolism including RNA 
polymerase 1 and II transcription, mRNA splicing and turnover and ribosome function. 

Figure 34. Database Searches with YJR041c, Database searches with 
20 YJR041C did not reveal any apparent mammalian counterparts. 

Figure 35. Summary of Characteristics for HESL 

Figure 36. Expression Correlation of HESL 

Figure 37. Treatments that Induce the HESl Reporter. Inhibitors of 
the isoprenoid biosynthetic pathway cause a significant induction of the HESl reporter. 
25 Figure 38. Browser Interface of Acacia's Expression Software. 

Figure 39. YJLJOSw UNA Sequence. 

Figure 40. YJLJOSw Protein Sequence. 

Figure 41. YMR134w DNA Sequence. 

Figure 42. YMR134w Protein Sequence. 
30 Figure 43. ^ERO^^c DNA Sequence. 

11 
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Figure 44. YER044c Protein Sequence. 

Figure 45. Mouse EST with Similarity to YER044c. 

Figure 46. Human EST with Similarity to y£/204^c. 

Figure 47. Rat EST with Similarity to YER044c, 
5 Figure 48. KLRyOOw DNA Sequence. 

Figure 49. ZL/?700w Protein Sequence. 

Figure 50. Human EST with Similarity to VLRJOOw. 

Figure 51. Mouse EST with Similarity to YLRJOOw, 

Figure 52. Mouse EST with Similarity to YLRJOOw, 
10 Figure 53. Mouse Gene with Similarity to YLRlOOw. 

Figure 54. KEROi^w DNA Sequence. 

Figure 55. YER034w Protein Sequence. 

Figure 56. 7ia077w DNA Sequence. 

Figure 57. YKL077w Protein Sequence. 
15 Figure 58. yGi?0^6w DNA Sequence. 

Figure 59. YGR046w Protein Sequence. 

Figure 60. YJR041c DNA Sequence. 

Figure 61. YJR041c Protein Sequence. 

Figure 62. HESl DNA Sequence. 
20 Figure 63. /ffi57 Protein Sequence. 

Figure 64. Reproducibility of the Genome Reporter Matrix™. 
Fluorescence from 864 independent untreated reporter-harboring yeast strains was 
plotted against the corresponding clones of an independent control array. 

Figure 65. Rat Gene with Similarity to YLRlOOw. 
2S Figure 66. DAKl DNA Sequence. 

Figure 67. DAKl Protein Sequence. 

Figure 68. PGUl DNA Sequence. 

Figure 69. PGUl Protein Sequence. 

Figure 70. STE18 DNA Sequence. 
30 Figure 71. STE18 Protein Sequence. 

12 
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Figure 72. YGLJ98w DNA Sequence. 
Figure 73. YGL198w Protein Sequence. 

Figure 74. Each dot on the 4-quadrant plot represents a treatment 
affecting the reporters affecting DAKl and PGUL Treatments are plotted as to 
5 whether DAKl was up-regulated (above x-axis) or down-regulated (below x-axis) and 
whether PGUl was up-regulated (right of the y-axis) or down-regulated (left of the y- 
axis). Thus, conditions where both reporters are up-regulated are in the upper right 
quadrant. Each division on the graph represents one natural log ratio change relative 
to controls. The hogi knock-out profile is indicated at the lower right. Thus, 
1 0 simultaneously measuring induction of PGUl above 2 natural- log ratios and repression 
of DAKl below one natural ratio specifically indicates Hoglp pathway inactivation. 

Figure 75. The plot description is the same as for Figure 74. The 
subset of treatments that target mitochondrial function form a distinct group in the 
upper right quadrant (within rectangle). Thus, simultaneously measuring induction of 
15 YGLlQHw and STE18 should specifically indicate perturbations of the mitochondria. 

DETAILED DESCRIPTION OF THE INVENTION 
Definitions and General Techniques 

Unless otherwise defined, all technical and scientific terms used herein 
have the meaning as commonly understood by one of ordinary skill in the art to which 
20 this invention belongs. The practice of the present invention employs, unless otherwise 
indicated, conventional techniques of chemistry, molecular biology, microbiology, 
. recombinant DNA, genetics and immunology. See, e.g., Maniatis et al., 1982; 

Sambrook et al., 1989; Ausubel et al., 1992; Glover, 1985; Anand, 1992; Guthrie and 
Fink, 1991 (which are incorporated herein by reference). 
25 A "regulon" is a group of genes that are coordinately regulated in 

response to a number of different stimuli, e.g., treatment with chemical compounds or 
mutations. The member genes of a regulon comprise a functional unit by which a cell 
is able to adapt to a changing environment. The regulation of these genes that led to 
their categorization could be at the level of transcription, mRNA stability, splicing, 
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translation or protein stability. The mode of regulation of each member gene of a 
given regulon need not be the same. 

Genes are categorized into separate reguions based upon changes in 
gene expression. In order to eflSciently and accurately group genes into functional 
S groups, it is necessary to observe each gene's expression change. Since many genes 

function in specialized roles, it is necessary to measure global gene expression under as 
diverse a variety of conditions as possible. Therefore, the database of expression 
profiles used in this invention was made fi-om a diverse collection of chemicals and 
mutant strains of yeast. In general, the greater the number of diverse stimuli which 

10 cause the genes of a regulon to exhibit coordinate expression and the higher the 
correlation coefficient, the more confident one will be that the regulon is a robust 
indicator of the pathway or process of interest. 

A "regulon indicator gene" (RIG) is a gene whose expression changes 
when a particular regulon or biochemical pathway or cellular process is activated or 

15 repressed. Although a RIG*s expression may correlate with a particular biochemical 
pathway, the RIG does not necessarily have to be a part of the biochemical pathway 
for which it is an indicator. A RIG may comprise the entire gene, the 5' region of the 
gene including the promoter and/or enhancer and all or a part of the coding region, or 
a fragment, conservatively modified variant or homolog thereof which retains the 

20 indicator function of the RIG. A RIG may be coordinately expressed with a particular 
biological pathway, such that when the pathway is activated the RIG is more highly 
expressed and when the pathway is repressed the RIG's expression is repressed as 
well. However, the invention also encompasses RIGs in which there is an inverse 
correlation with a particular pathway. In this case, activation of a pathway would lead 

25 to a repression of RIG expression, while repression of a pathway would lead to 

activation of RIG expression. A RIG may be coordinately expressed with a particular 
biological pathway, such that when the pathway is activated the RIG is more highly 
. expressed. However, the invention also encompasses RIGs in which there is an inverse 
correlation with a particular pathway. In this case, activation of a pathway would lead 

30 to a repression of RIG expression. Furthermore, the invention also encompasses RIGs 
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which are not necessarily part of the regulon, pathway or process for which they are 
indicators. In this case, expression of RIGs may be activated or repressed specifically 
in response to perturbations of a regulon, pathway or process even though the RIG 
itself may only be indirectly related or have no apparent relationship in function to the 
5 regulon, pathway or process. 

In a preferred embodiment, a RIG is specific to a particular pathway, 
wherein its expression changes most significantly when a particular pathway is 
activated or repressed. Such a highly specific regulon indicator gene cannot always be 
found for a pathway of interest. In such cases, more than one RIG can be identified 
10 that, when their expression patterns are taken together, correlate with specificity to the 
pathway of interest. Thus, in another preferred embodiment, a plurality of RIGs is 
identified wherein the coordinated expression pattern of the plurality of RIGs is . 
specific to a particular biological pathway. In this preferred embodiment, expression of 
each member of the plurality of RIGs may independently increase or decrease when the 
1 5 biological pathway of interest is activated or repressed. 

In another preferred embodiment, a RIG is highly sensitive to changes 
in activation or repression of a pathway, such that even a small perturbation in 
regulation of a pathway results in a change in RIG expression. In a further preferred 
embodiment, a RIG has a large dynamic range, and is highly induced or repressed upon 
20 the corresponding perturbation of the pathway to which it is correlated. 

In another preferred embodiment, a RIG does not contain sequences 
that are problematic for maintaining on plasmids when introduced into host cells. Such 
sequences that may be problematic include centromeric sequences or sites that are 
particularly susceptible to recombination. 

A "target gene" or "regulon target gene" is a gene whose fijnction is 
desirable to modulate. A target gene may consist of the entire gene, the 5* region 
comprising the promoter and/or enhancer and ail or a part of the coding region, or a 
fragment, conservatively modified variant or homolog thereof which retains the 
function of the target gene. In general, a target gene encodes a protein which is a part 
30 of the biological (e.g., metabolic or biochemical) pathway or process whose 
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modulation would result in a desired outcome. In a preferred embodiment, a target 
gene is a control point in such a pathway. In one more preferred embodiment, a target 
gene is a control point that is relatively ''upstream'' in the metabolic pathway. 
"Upstream" means that the target gene is involved in one of the first steps of the 
5 metabolic pathway or process. In another more preferred embodiment, a target gene 
is a control point that is relatively '"downstream" but specific to a biological pathway 
or a branch of that pathway or process. "Downstream" means that the target gene is 
involved in one of the later steps of the pathway or process. 

A "target" or "target protein" is a protein whose expression or activity 

10 is to be modulated. A target may consist of the entire protein, or a fragment, mutein, 
derivative or homolog thereof which retains the function of the target. In general, a 
target is a protein included withm a biological pathway wherein it is desired to 
modulate the process which the protein is involved in. In a preferred embodiment, a 
target is a control point in such a biological pathway. In a more preferred 

IS embodiment, a target is a control point that is relatively "upstream" in the biological 
pathway. "Upstream" means that the target is involved in one of the first steps of the 
pathway. In another more preferred embodiment, a target is a control point that is 
relatively "downstream" but specific to a biological pathway or a branch of that 
pathway. "Downstream" means that the target is involved in one of the later steps of 

20 the pathway. 

A "target-dependent reporter gene" is a gene whose expression is 
altered in a cell in which the target gene has been altered or inactivated compared to 
the cell which expresses the normal target gene. The expression of the target- 
dependent reporter gene may increase or decrease in a cell harboring an altered or 

25 inactivated target gene, depending upon the identity of the gene. If expression of the 
target-dependent reporter gene increases in the cell harboring the altered or inactivated 
target gene, then a potential inhibitor of the regulon target gene will increase 
expression of the target-dependent reporter gene, and if expression of the target- 
dependent reporter gene decreases in the cell, then a potential inhibitor of the regulon 

30 target gene will decrease expression of the target-dependent reporter gene. 
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By "pathway" is meant any biological, e.g., metabolic or biochemical, 
set of concerted reactions which occur in response to a particular signal or stimulus in 
a cell. The isoprenoid pathway is one example of such a pathway. Other pathways 
include, without limitation, amino acid and protein synthesis, lipid synthesis, protein 
5 and lipid glycosylation, protein modification, DNA synthesis and repair, RNA 

transcription, phospholipid synthesis, nucleotide synthesis, and energy generation and 
storage (e.g., glycolysis, citric acid cycle, oxidative phosphorylation, gluconeogenesis, 
pentose phosphate pathway, fatty acid metabolism, glycogen and disaccharide 
metabolism, amino acid degradation and the urea cycle), signal transduction and 
10 growth control. 

By "process" is meant any biological reaction or set of reactions that 
occurs within a cell or organism that occurs in response to a stimulus or signal, or that 
occurs during growth, homeostasis, development, differentiation or death of the cell or 
organism. 

1 5 An "isolated" protein or polypeptide is one that has been separated 

from naturally associated components that accompany it in its native state. Thus, a 
polypeptide that is chemically synthesized or synthesized in a cellular system different 
from the cell from which it naturally originates will be "isolated" from its naturally 
associated components. A protein may also be rendered substantially free of naturally 

20 associated components by isolation, using protein purification techniques well known 
in the art. 

A monomeric protein is "substantially pure," "substantially 
homogeneous" or "substantially purified" when at least about 60 to 75% of a sample 
exhibits a single polypeptide sequence. A substantially pure protein will typically 

25 comprise about 60 to 90% WAV of a protein sample, more usually about 95%, and 

preferably will be over 99% pure. Protein purity or homogeneity may be indicated by a 
number of means well known in the art, such as polyacrylamide gel electrophoresis of a 
protein sample, followed by visualizing a single polypeptide band upon staining the gel 
with a stain well known in the art. For certain purposes, higher resolution may be 

30 provided by using HPLC or other means well known in the art for purification. 
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A S. cerevisiae protein has "homology" or is "homologous" to a 
protein from another organism if the encoded amino acid sequence of the yeast protein 
has a similar sequence to the encoded amino acid sequence of a protein of a different 
organism. Alternatively, a S. cerevisiae protein may have homology or be homologous 
5 to another S, cerevisiae protein if the two proteins have similar amino acid sequences. 
Although two proteins are said to be "homologous," this does not imply that there is 
necessarily an evolutionary relationship between the proteins. Instead, the term 
"homologous" is defined to mean that the two proteins have similar amino acid 
sequences. In addition, although in many cases proteins with similar amino acid 

10 sequences will have similar functions, the term "homologous", does not imply that the 
proteins must be functionally similar to each other. 

When "homologous" is used in reference to proteins or peptides, it is 
recognized that residue positions that are not identical ofien differ by conservative 
amino acid substitutions. A "conservative amino acid substitution" is one in which an 

15 amino acid residue is substituted by another amino acid residue having a side chain (R 
group) with similar chemical properties (e.g., charge or hydrophobicity). In general, a 
conservative ammo acid substitution will not substantially change the functional 
properties of a protein. In cases where two or more ammo acid sequences differ from 
each other by conservative substitutions, the percent sequence identity or degree of 

20 homology may be adjusted upwards to correct for the conservative nature of the 

substitution. Means for making this adjustment are well known to those of skill in the 
art (see, e.g., Pearson et al.,1994, and [Henikoff et al., 1992, herein incorporated by 
reference). 

The following six groups each contain amino acids that are conservative 
25 substitutions for one another: 

1 ) Alanine (A), Serine (S), Threonine (T); 

2) Aspartic Acid p). Glutamic Acid (E); 

3) Asparagine (N), Glutamine (Q); 

4) Arginine (R), Lysine (K); 

3^ 5) Isoleucine (I), Leucine (L), Methionine (M), Vahne (V), and • 
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6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

Sequence homology for polypeptides, which is also referred to as 
sequence identity, is typically measured using sequence analysis software. See, e.g., 
the Sequence Analysis Software Package of the Genetics Computer Group (GCG), 
5 University of Wisconsin Biotechnology Center, 9 1 0 University Avenue, Madison, 

Wisconsin 53705. Protein analysis software matches similar sequences using measure 
of homology assigned to yarious substitutions, deletions and other modifications, 
including conservative amino acid substitutions. For instance, GCG contains programs 
such as "Gap" and "Bestfit" which can be used with defeuk parameters to determine 
10 sequence homology or sequence identity between closely related polypeptides, such as 
homologous polypeptides fi'om diflFerent species of organisms or between a wild type 
protein and a mutein thereof 

A preferred algorithm when comparing a S. cerevisiae sequence to a 
database containing a large number of sequences from different organisms is the 
1 5 computer program BLAST, especially blastp or tblastn (Altschul et al., 1 997, herein 
incorporated by reference). Preferred parameters for blastp are: 

Expectation value: 10 (default) 

Filter: seg (default) 

Cost to open a gap: 1 1 (default) 

20 Cost to extend a gap: 1 (defauh 

Max. alignments: 100 (default) 

Word size: 1 1 (default) 

No. of descriptions: 1 00 (default) 

Substitution Matrix: BLOSUM62 

25 The length of polypeptide sequences compared for homology will 

generally be at least about 1 6 amino acid residues, usually at least about 20 residues, 
more usually at least about 24 residues, typically at least about 28 residues, and 
preferably more than about 35 residues. When searching a database containing 
sequences from a large number of different organisms using a S. cerevisiae query 

30 sequence, it is preferable to compare amino acid sequences. Comparison of amino acid 
sequences is preferred to comparing nucleotide sequences because S. cerevisiae has 
significantly different codon usage compared to mammalian or plant codon usage. 

Database searching using amino acid sequences can be measured by 
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algorithms other than blastp known in the art. For instance, polypeptide sequences can 
be compared using Fasta, a program in GCG Version 6. 1 . Fasta provides alignments 
and percent sequence identity of the regions of the best overlap between the query and 
search sequences (Pearson, 1990, herein incorporated by reference). For example, 
5 percent sequence identity between amino acid sequences can be determined using Fasta 
with its default parameters (a word size of 2 and the PAM250 scoring matrix), as 
provided in GCG Version 6. 1, herein incorporated by reference. 

The invention envisions two general types of polypeptide "homologs." 
Type 1 homologs are strong homologs. A comparison of two polypeptides that are 

ID Type 1 homologs would result in a blastp score of less than 1x10"^°, using the blastp 
algorithm and the parameters listed above. The lower the blastp score, that is, the 
closer it is to zero, the better the match between the polypeptide sequences. For 
instance, yeast lanosterol demethylase, which is a common target of antifungal agents, 
as discussed above, has a Type 1 homolog in humans. The probability score (e.g., 

15 blastp score) is dependent upon the size of the database. Comparison of yeast and 
human lanosterol demethylases produces a blastp score of 1x10'*^. 

Type 2 homologs are weaker homologs. A comparison of two 
polypeptides that are Type 2 homologs would result in a blastp score of between 1x10* 
and 1x10"^^ using the Blast algorithm and the parameters listed above. One having 

20 ordinary skill in the art will recognize that other algorithms can be used to determine 
weak or strong homology. 

The terms "no substantial homology" or "no human (or mammalian, 
vertebrate, amphibian, fish, insect or plant) homolog" refers to a yeast polypeptide 
sequence which exhibits no substantial sequence identity with a polypeptide sequence 

25 from human, non-human mammals, other vertebrates, insects or plants. A comparison 
of two polypeptides which have no substantial homology to one another would result 
in a blastp score of greater than 1x10"*^ using the Blast algorithm and the parameters 
listed above. One having ordinary skill in the art will recognize that other algorithms 
can be used to determine whether two polypeptides demonstrate no substantial 

30 homology to each other. 
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A polypeptide "fragment," "portion" or "segment" refers to a stretch of 
amino acid residues of at least about five to seven contiguous amino acids, often at 
least about seven to nine contiguous amino acids, typically at least about nine to 13 
contiguous amino acids and, most preferably, at least about 20 to 30 or more ' 
S contiguous amino acids. 

■ 

A polypeptide "mutein" refers to a polypeptide whose sequence 
contains substitutions, insertions or deletions of one or more amino acids compared to 
the amino acid sequence of the native or wild type protein. A mutein has at least 50% 
sequence homology to the wild type protein, preferred is 60% sequence homology, 
10 more preferred is 70% sequence homology. Most preferred are muteins having 80%, 
90% or 95% sequence homology to the wild type protein, in which sequence 
homology is measured by any common sequence analysis algorithm, such as Gap or 
Bestfit. 

A "derivative" refers to polypeptides or firagments thereof that are 

15 substantially homologous in primary structural sequence but which include, e.g., in 
vivo or /// vitro chemical and biochemical modifications or which incorporate unusual 
amino acids. Such modifications include, for example, acetylation, carboxylation, 
phosphorylation, glycosylation, ubiquitination, labeling, e.g., with radionuclides, and 
• various enzymatic modifications, as will be readily appreciated by those well skilled in 

20 the art. A variety of methods for labeling polypeptides and of substituents or labels 
usefiil for such purposes are well known in the art, and include radioactive isotopes 
such as ^^^I, ^^P, ^'S, and ^H, ligands which bind to labeled antiUgands (e.g., 
antibodies), fluorophores, chemiluminescent agents, enzymes, and antiligands which 
can serve as specific binding pair members for a labeled ligand. The choice of label 

25 depends on the sensitivity required, ease of conjugation with the primer, stability 

requirements, and available instrumentation. Methods for labeling polypeptides are 
well known in the art. See Ausubel et al., 1992, hereby incorporated by reference. 

The term "fusion protein" refers to polypeptides comprising 
polypeptides or fragments coupled to heterologous amino acid sequences. Fusion 

30 proteins are useful because they can be constructed to contain two or more desired 
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functional elements from two or more diflFerent proteins. Fusion proteins can be 
produced recombinantly by constructing a nucleic acid sequence which encodes the 
polypeptide or a fragment thereof in frame with a nucleic acid sequence encoding a 
different protein or peptide and then expressing the fusion protein. Alternatively, a 
5 fusion protein can be produced chemically by crosslinking the polypeptide or a 
fragment thereof to another protein. 

An "isolated" or "substantially pure" nucleic acid or polynucleotide 
(e.g.,anRNA,DNA or a mixed polymer) is one which is substantially separated from 
other cellular components that naturally accompany the native polynucleotide in its 

10 natural host cell, e.g., ribosomes, polymerases, or genomic sequences with which it is 
naturally associated. The term embraces a nucleic acid or polynucleotide that has been 
removed from its naturally occurring environment. The term "isolated" or 
"substantially pure" also can be used in reference to recombinant or cloned DNA 
isolates, chemically synthesized polynucleotide analogs, or polynucleotide analogs that 

15 are biologically synthesized by heterologous systems. 

The term "percent sequence identity" or "identical" in the context of 
nucleic acid sequences refers to the residues in the two sequences which are the same 
when aligned for maximum correspondence. The length of sequence identity 
comparison may be over a stretch of at least about nine nucleotides, usually at least 

20 about 20 nucleotides, more usually at least about 24 nucleotides, typically at least 
about 28 nucleotides, more typically at least about 32 nucleotides, and preferably at 
least about 36 or more nucleotides. There are a number of different algorithms known 
in the art which can be used to measure nucleotide sequence identity. For instance, 
polynucleotide sequences can be compared using Fasta, a program in GCG Version 

25 6. 1 . Fasta provides alignments and percent sequence identity of the regions of the best 
overlap between the query and search sequences (Pearson, 1990, herein incorporated 
by reference). For instance, percent sequence identity between nucleic acid sequences 
can be determined using Fasta with its default parameters (a word size of 6 and the 
NOPAMfactor for the scoring matrix) as provided in GCG Version 6. 1, herein 

30 incorporated by reference. 
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The term "substantial homology" or "substantial similarity," when 
referring to a nucleic acid or fragment thereof, indicates that, when optimally aligned 
with appropriate nucleotide insertions or deletions with another nucleic acid (or its 
complementary strand), there is nucleotide sequence identity in at least about 60% of 
5 the nucleotide bases, usually at least about 70%, more usually at least about 80%, 
preferably at least about 90%, and more preferably at least about 95-98% of the 
nucleotide bases, as measured by any well-known algorithm of sequence identity, such 
as Fasta, as discussed above. 

Alternatively, substantial homology or similarity exists when a nucleic 

10 acid or fragment thereof hybridizes to another nucleic acid, to. a strand of another 
nucleic acid, or to the complementary strand thereof, under selective hybridization 
conditions. Typically, selective hybridization will occur when there is at least about 
55% sequence identity - preferably at least about 65%, more preferably at least about 
75%, and most preferably at least about 90% — over a stretch of at least about 14 

15 nucleotides. See, e.g., Kanehisa, 1984, herein incorporated by reference. 

Nucleic acid hybridization will be affected by such conditions as salt 
concentration, temperature, solvents, the base composition of the hybridizing species, 
length of the complementary regions, and the number of nucleotide base mismatches 
between the hybridizing nucleic acids, as will be readily appreciated by those skilled in 

20 the art. "Stringent hybridization conditions" and "stringent wash conditions" in the 
context of nucleic acid hybridization experiments depend upon a number of different 
physical parameters. The most important parameters include temperature of 
hybridization, base composition of the nucleic acids, sah concentration and length of 
the nucleic acid. One having ordinary skill in the art knows how to vary these 

25 parameters to achieve a particular stringency of hybridization. In general, "stringent 

hybridization" is performed at about 25 °C below the thermal mehing point (TJ for the 
specific DNA hybrid under a particular set of conditions. "Stringent washing" is 
performed at temperatures about 5°C lower than the T„ for the specific DNA hybrid 
under a particular set of conditions. The T„ is the temperature at which 50% of the 

30 target sequence hybridizes to a perfectly matched probe. See Sambrook et al., page 
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9.51, hereby incorporated by reference. 

The for a particular DNA-DNA hybrid can be estimated by the 

formula: 

T„ = 81 .5°C + 16.6 (log,o[Na"]) + 0.41 (fraction G + C) - 0.63 (% 
formamide) - (600/1) where 1 is the length of the hybrid in base pairs. 

The T„ for a particular RNA-RNA hybrid can be estimated by the 

formula: 

T„ = 79,rC + 18.5 (log,o[Na"]) + 0.58 (fraction G + C) + 1 1 .8 
(fraction G + C)^ - 0.35 (% formamide) - (820/1). 

The T„ for a particular RNA-DN A hybrid can be estimated by the 

formula: 

T„ = 79.8"C + l8.5(iogio[Na1) + 0.58 (fraction G + C) + 1 1.8 
(fraction G + C)^ - 0.50 (% formamide) - (820/1). 

In general, the T„ decreases by 1-1.5"C for each 1% of mismatch 
between two nucleic acid sequences. Thus, one having ordinary skill in the art can 
alter hybridization and/or washing conditions to obtain sequences that have higher or 
lower degrees of sequence identity to the target nucleic acid. For instance, to obtain 
hybridizing nucleic acids that contain up to 10% mismatch from the target nucleic acid 
sequence, 10-1 5 X would be subtracted from the calculated T„ of a perfectly matched 
hybrid, and then the hybridization and washing temperatures adjusted accordingly. 
Probe sequences may also hybridize specifically to duplex DNA under certain 
conditions to form triplex or other higher order DNA complexes. The preparation of 
such probes and suitable hybridization conditions are well known in the art. 

An example of stringent hybridization conditions for hybridization of 
complementary nucleic acid sequences having more than 100 complementary residues 
on a filter in a Southern or Northern blot or for screening a library is 50% 
formamide/6X SSC at 42°C for at least ten hours. Another example of stringent 
hybridization conditions is 6X SSC at 68*'C for at least ten hours! An example of low 
stringency hybridization conditions for hybridization of complementary nucleic acid 
sequences having more than 100 complementary residues on a filter in a Southern or 
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northern blot or for screening a library is 6X SSC at 42°C for at least ten hours. 
Hybridization conditions to identify nucleic acid sequences that are similar but not 
identical can be identified by experimentally changing the hybridization temperature 
from 68°C to 42°C while keeping the salt concentration constant (6X SSC), or 
S keeping the hybridization temperature and salt concentration constant (e.g. 42''C and 
6X SSC) and varying the formamide concentration from 50% to 0%. Hybridization 
buffers may also include blocking agents to lower background. These agents are well- 
known in the art. See Sambrook at al., pages 8.46 and 9.46-9.58, herein incorporated 
by reference. 

10 Wash conditions also can be altered to change stringency cpnditions. 

An example of stringent wash conditions is a 0.2x SSC wash at 65°C for 15 minutes 
(see Sambrook et al., for SSC buffer). Often the high stringency wash is preceded by a 
low stringency wash to remove excess probe. An exemplary medium stringency wash 
for duplex DNA of more than 100 base pairs is Ix SSC at 45°C for 15 minutes. An 

15 exemplary low stringency wash for such a duplex is 4x SSC at 40'*C for 15 minutes. 
In general, signal-to-noise ratio of 2x or higher than that observed for an unrelated 
probe in the particular hybridization assay indicates detection of a specific 
hybridization. 

As defined herein, nucleic acids that do not hybridize to each other 
20 under stringent conditions are still substantially homologous to one another if they 
encode polypeptides that are substantially identical to each other. This occurs, for 
example, when a nucleic acid is created synthetically or recombinantly using a high 
codon degeneracy as permitted by the redundancy of the genetic code. 

The polynucleotides of this invention may include both sense and 
25 antisense strands of RNA, cDNA, genomic DNA, and synthetic forms and mixed 
polymers of the above. They may be modified chemically or biochemically or may 
contain non-natural or derivatized nucleotide bases, as will be readily appreciated by 
those of skill in the art. Such modifications include, for example, labels, methylation, 
substitution of one or more of the naturally occurring nucleotides with an analog, 
30 internucleotide modifications such as uncharged linkages (e.g., methyl phosphonates. 
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phosphotriesters, phosphoramidates, carbamates, etc.), charged linkages (e.g., 
phosphorothioates, phosphorodithioates, etc), pendent moieties (e.g., polypeptides)i - 
intercalators (e.g., acridine, psoralen, etc.), chelators, alkylators, and modified linkages 
(e.g., alpha anomeric nucleic acids, etc.) Also included are synthetic molecules that 

S mimic polynucleotides in their ability to bind to a designated sequence via hydrogen 
bonding and other chemical interactions. Such molecules are known in the art and 
include, for example, those in which peptide linkages substitute for phosphate linkages 
in the backbone of the molecule. 

"Conservatively modified variations" or "conservatively modified 

1 0 variants" of a particular nucleic acid sequence refers to nucleic acids that encode 
identical or essentially identical amino acid sequences or DNA sequences where no 
amino acid sequence is encoded. Due to the degeneracy of the genetic code, a large 
number of fimctionally identical nucleic acids encode any given polypeptide sequence. 
When a nucleic acid sequence is changed at one or more positions with no 

1 5 corresponding change in the amino acid sequence which it encodes, that mutation is 
called a "silent mutation." Thus, one species of a conservatively modified variation 
according to this invention is a silent mutation. Accordingly, every nucleic acid 
sequence herein which encodes a polypeptide also describes every possible silent 
mutation or variation. 

20 Furthermore, one of skill in the art will recognize that individual 

substitutions, deletions, additions and the like, which alter, add or delete a single amino 
acid or a small percentage of amino acids (less than S%, more tjrpically less than 1%) 
in an encoded sequence are "conservatively modified variations" or "conservatively 
modified variants" where the alterations result in the substitution of one amino acid 

25 with a chemically similar amino acid. Conservative substitution tables providing 
fimctionally similar amino acids.are well known in the art. 

The term "antibody" refers to a polypeptide encoded by an 
immunoglobulin gene, genes, or fragments thereof The immunoglobulin genes include 
the kappa, lambda, alpha, gamma, delta, epsilon and mu constant regions, as well as a 

30 myriad of immunoglobulin variable regions. Light chains are classified as either kappa 
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or lambda. Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which 
in turn define the immunoglobulin classes IgG, IgM, IgA, IgD and IgE, respectively. 

Antibodies exist for example, as intact immunoglobulins or as a number 
of well-characterized fragments produced by digestion with various peptidases. For 
S example, trypsin digests an antibody below the disulfide linkages in the hinge region to 
produce F(ab)'2, ^ dimer of Fab which itself is a light chain joined to a Vjj-ChI by a 
disulfide bond. The F(ab)'2 niay be reduced under mild conditions to break the 
disulfide linkage in the hinge region thereby converting the F(ab)'2 dimer to a Fab' 
monomer. The Fab' monomer is essentially an Fab with part of the hinge region. See 

10 Paul ( 1 993) (incorporated herein by reference), for a detailed description of epitopes, 
antibodies and antibody fi'agments. One of skill in the art recognizes that such Fab' 
fragments may be synthesized de novo either chemically or using recombinant DNA 
technology. Thus, as used herein, the term antibody includes antibody fragments 
produced by the modification of whole antibodies or those synthesized de novo. The 

IS term antibody also includes single-chain antibodies, which generally consist of the 
variable domain of a heavy chain linked to the variable domain of a light chain. The 
production of single-chain antibodies is well known in the art (see, e.g., U.S. Pat. No. 
5,359,046). The antibodies of the present invention are optionally derived from 
libraries of recombinant antibodies in phage or similar vectors (see, e.g., Huse et ai. 

20 (1989); Ward et al. (1989); Vaughan et al. (1996) which are incorporated herein by 
reference). 

As used herein, "epitope" refers to an antigenic determinant of a 
polypeptide, i.e., a region of a polypeptide that provokes an immunological response in 
a host. This region need not comprise consecutive amino acids. The term epitope is 
25 also known in the art as "antigenic determinant." An epitope may comprise as few as 
three amino acids in a spatial conformation which is unique to the inmiune system of 
the host. Generally, an epitope consists of at least five such amino acids, and more 
usually consists of at least 8-10 such amino acids. Methods for determining the spatial, 
conformation of such amino acids are known in the art. 
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Methods for Analyzing ORF Gene Expression 

The ceirs ability to monitor its own biochemical ecology may be 
considered as a fully integrated multi-dimensional set of specific biochemical assays. 
The data from each individual assay manifests itself either directly or indirectly in the 
change in expression of a single gene or small set of genes. The individual components 
of the assaying capabilities of the cell may be extracted by measuring the changes in 
global gene expression in response to a controlled experimental challenge. 

The measurement of global gene expression may be done by a number 
of different methods. One technique is that of hybridization to nucleic acid arrays on 
solid surfaces, such as "gene chips" (Fodor et al., 1991). Another method uses a 
reporter construct in the GRM or an equivalent matrix comprising living cells, 
preferably eukaryotic cells, and more preferably yeast, insect, plant, avian, fish or 
mammalian cultured cells. Other methods include SAGE. 

DNA Chip Technology 

One method for determinmg comprehensive gene expression profiles is 
DNA gene chip technology (see, e^, Fodor et al., 1991). A DNA gene chip can be 

« 

made comprising a large number of unmobilized single-stranded nucleic acids, each of 
which hybridizes specifically to a gene or its mRNA, representing a particular genome 
or a significant subset thereof Messenger RNA molecules extracted from a cell or 
cDN A molecules converted from such mRNA molecules can be labeled. The labeling 
can be accomplished, for example, radioisotopically or fluorescently by methods well 
known in the art. These mRNA or cDNA molecules are rendered single-stranded and 
then allowed to hybridize to the immobilized single-stranded nucleic acids on the gene 
chip. A computer equipped with a scanner then determines the extent of hybridization, 
thereby quantitating the amount of mRNA produced for any given gene or genetic 
sequence. 

Profiles of gene expression generated under different conditions or in 
response to different stimuli such as treatment with chemical compounds are produced 
by treating cells with a compound, isolating the mRNA the cells, optionally producing 
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cDNA and then hybridizing the single-stranded nucleic acids on the gene chip as 
discussed above. Preferably, software is used to correlate the expression of each gene 
on the hybridization chip relative to other genes under different conditions or in 
response to different treatments (see below). 

Promoter elements from genes of interest that respond to an input 
signal can then be isolated and operatively linked to a reporter gene described above by 
recombinant DNA techniques well known in the art for further characterization. 



Genome Reporter Matrix™ Technology 

An alternative method to DNA gene chip technology is the use of a 
Genome Reporter Matrix™ (GRM), or an equivalent thereof The description below 
of the generation of gene expression profiles utilizing the Genome Reporter Matrix™ 
has been described essentially in United States Patents 5,569,888 and 5,777,888, both 
of which are incorporated herein by reference. 

The promoter (and optionally, 5' upstream regulatory elements and/or 5* 
upstream untranslated sequences) of an ORF or a gene from a cellular genome 
(preferably a eukaryotic genome) is fused to a reporter gene creating a transcriptional 
and/or translational fusion of the promoter to the reporter gene. In a preferred 
embodiment, the genome is that of S. cerevisiae. The promoter and optional 
additional sequences comprise all the regulatory elements necessary for transcriptional 
(and optionally translational) control of an attached coding sequence. The reporter 
gene can be any gene that, when expressed in a suitable host, encodes a product that 
can be detected by a quantitative assay. Any suitable assay may be used, including but 
not limited to enzymatic, colorimetric, fluorescence or other spectrographic assays, 
fluorescent activated cell sorting assay and immunological assays. Examples of 
suitable reporter genes include, inter aha, green fluorescent protein (GFP), p- 
lactamase, lacZ, invertase, membrane bound proteins (e.g., CD2, CD4, CDS, the 
influenza hemagglutinin protein, and others well known in the art) to which high 
affinity antibodies directed to them exist or can be made routinely, fiasion protein 
comprising membrane bound protein appropriately fiised to an antigen tag domain 
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(e.g., hemagglutinin or Myc and others well known in the art). In a preferred 
embodiment, the reporter protein is GFP from the jellyfish Aequorea victoria, GFP is 
a naturally fluorescing protein that does not require the addition of any exogenous 
substrates for activity. The ability to measure GFP fluorescence in intact living cells 
5 makes it an ideal reporter protein for the GRM or an equivalent matrix comprising 
living cells. 

In a preferred embodiment, reporter constructs comprise the 5* region 
of the ORF comprising the promoter of the ORF and other expression regulatory 
sequences, and generally the first four codons of the ORF fiised in-frame to the green 

10 fluorescent protein. In a more preferred embodiment, approximately 1200 base-pairs 
of 5' regulatory sequence are included in each fiision. Only 228 yeast ORFs (3.5%) 
possess introns. Of these 228 intron-containing ORFs, all but four contain only one 
intron. In these ORFs, fiisions are created two to four codons past (3' to) the splice 
junction. Therefore, these fiisions must undergo splicing in order to create a fiinctional 

15 reporter fiision. 

Each reporter is assembled in an episomal yeast shuttle vector (either 
CEN or 2\i plasmid) or on a yeast integrating vector for subsequent insertion into the 
chromosomal DNA. In a preferred embodiment, the gene reporter constructs are built 
using a yeast multicopy vector. A multicopy vector is chosen to facilitate easy transfer 

20 of the reporter constructs to many different yeast strain backgrounds. In addition, the 
vector replicates at an average of 10 to 20 copies per cell, providing added sensitivity 
for detecting genes that are expressed at a low level. In principle, introducing 
additional copies of a gene's regulatory region could, through titration of regulatory 
proteins, disrupt a response of interest. However, in practice this appears not to occur, 

25 and efforts to successfully exploit such titration effects have required much higher copy 
number vectors and have been largely unsuccessfiil. In another preferred embodiment, 
the reporter constructs are maintained on episomal plasmids in yeast. 

In one embodiment, a plurality (all or a significant subset) of the 
resulting approximately 6,000 reporter constructs is transformed into a strain of yeast. 

» 

30 The resulting strains constitute one embodiment of the Genome Reporter Matrix™. 
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See Example 1 . 

Profiles are produced by arraying wild type or mutant cells carrying the 
reporter fusion genes in growth media containing different drugs and chemical 
compounds and measuring changes in expression of the reporter gene by the 

5 appropriate assay (see below). In a preferred embodiment, where the reporter gene is 
GFP, measurement of changes in expression are done by measuring the amount of 
green light produced by the cells over time with an automated fluorescence scanner. 
Alternatively, the drugs or chemical compounds may be added to the yeast cells after 
they have been arrayed onto growth media and then measuring changes in reporter 

1 0 gene expression by the appropriate assay. 

Over 93% of the reporters are detectable over background on rich 
medium. The reproducibility of individual reporters is high, with expression generally 
varying by less than 10%. In contrast, hybridization experiments have proven 
unreliable for effects of less than a factor of two. Figure 64 depicts expression data of 

15 the GRM from two independent experiments plotted against each other. 

In a preferred embodiment, the GRM is used to obtain gene expression 
information from a genome. The GRM is preferred to hybridization-based methods of 
profiling for several reasons. First, because the promoter-reporter fusions include the 
first four amino acids of the native gene product, the response profiles are composites 

20 of both transcriptional and translational effects. The importance of being able to 
monitor both levels of response is underscored by the experience vAth bacterial 
antibiotics. Those antibiotics that work at the translational level have a greater 
therapeutic performance than those affecting transcription. Because hybridization- 
based methods can reveal only effects on transcription, profiling with the GRM 

25 provides a more complete view of the fiill spectrum of biological effects induced by 
exposure to drugs or compounds. 

Second, the GRM permits profiling of gene expression changes in living 
cells, which permits one to easily measure the kinetics of changes in gene response 
profiles in the same population of cells following exposure to different drugs and 
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chemical agents. Thus, by collecting multiple data sets over time, one can identify the 
genes that make up primary and secondary responses. 

Third, hybridization-based methods require relatively sophisticated 
molecular procedures to produce labeled cDNA, followed by a 14 hour hybridization 
5 of labeled cDNA probes to target DNA arrays on slides or chips. The GRM requires 
only that being able to produce arrays of colonies and measure emitted light. These 
procedures are easier to scale up in an industrial setting than are sophisticated 
molecular biology methods, rendering data that is more straightforward to produce and 
more reproducible in nature. 

1 0 Gene lixpression Profiles 

Using the reporter construct, gene chip technology or another method 
for obtaining genome-wide gene expression, the gene expression profile of yeast genes 
can be obtained. In a preferred embodiment, either the GRM or gene chip technology 
is used. In a more preferred embodiment, the GRM is treated with a number of 

15 pharmaceutical compounds and the resulting expression of the reporter constructs is 
analyzed. Generally, for each pharmaceutical compound, the expression of the 
reporter constructs are analyzed in the presence of the vehicle for the pharmaceutical 
compound alone and is compared to the expression of the reporter constructs in the 
presence of the pharmaceutical compound. Changes in expression of the reporter 

20 constructs in the absence and presence of the pharmaceutical compound is obtained 
either by subtracting the baseline level of expression from the level after treatment or 
dividing the baseline level of expression firom the level after treatment. By looking at a 
large number of reporter constructs, one can assign yeast ORFs to ftinctional groups 
based upon their expression patterns in response to various pharmaceutical 

25 compounds. These functional groups may provide valuable information as to the 
function of the yeast proteins as well as their human, non-human manmialian, avian, 
fish, insect and plant counterparts. 

Preferably, software is used to correlate the expression of each gene in 
the GRM or on the DNA chip relative to other genes under different conditions and in 

32 



wo 00/58521 



PCT/USOO/08604 



response to different pharmaceutical compounds. In one preferred embodiment, the 
software is capable of producing a correlation coefficient for each gene's expression 
relative to every other gene across ail expression profiles in a database. Such analysis 
reveals groups of genes that exhibit coordinate regulation (regulons). See, e.g., U.S. 
5 Serial No. 09/076,668, now pending; Eisen et al. (1998); and Tamayo et al. (1999). 

In a preferred embodiment, a gene of unknown function may be placed 
into a functional genetic group by the following steps: 

a) generating a gene expression profile for Gene X, a gene of 
unknown function; 

10 b) comparing the gene expression profile of Gene X with 

expression profiles of a plurality of other genes in a database of 
compiled gene expression profiles to generate expression 
correlation coefficients; 

c) identifying based on their expression correlation coefficients a 

1 5 set of genes comprising Gene X that are coordinately expressed; 

d) determining if the genes whose expression is most highly 
correlated with that of Gene X belong to a gene regulon 
involved in a known biological pathway, or a common set of 
biological reactions or fimctions; and 

20 e) optionally testing the efiFect on Gene X expression of at least 

one altered condition or treatment known to affect the function 
to which Gene X hs been ascribed. 
If Gene X expression is coordinate with expression of the regulon, then Gene X is 
placed in the regulon. 

% 

25 Methods to Identify Potential RIGs 

A GRM (or an equivalent) is chemically treated with a large number of 
compounds. Regulons are identified as groups of genes that are coordinately regulated 
in response to genetic mutations, treatment with compounds or different environmental 
conditions. In a preferred embodiment, regulons are identified using correlation 
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coelficients assembled by software that does clustering analysis, such as that described 
in U.S. Serial No. 09/076,668, now pending; Eisen et al. (1998); and Tamayo et al. 
(1999). In a preferred embodiment, genes that constitute a regulon have a correlation 
coefficient of greater than 0.5. In a more preferred embodiment, genes that constitute 

5 a regulon have a correlation coeflBcient of at least 0.6 or 0.7. In a further preferred 
embodiment, genes that constitute a regulon have a correlation coefficient of at least 
0.8 or 0.9. The correlation coefficient may be measured by any method of obtaining 
correlation coefficients, including, without limitation, the method described in United 
States Patent Application Serial No, 09/076,668, now pending or in Eisen et al. 

10 (1998). 

Once a group of genes has been grouped into a regulon, one can 
identify potential regulon indicator genes (RIGs), which may or may not be a member 
of the regulon, pathway or process with the regulon, pathway, or process for which 
they are an indicator. RIGs may be either characterized or uncharacterized genes 

1 5 provided they have certain characteristics. Preferred characteristic include one or more 
of the following: 1) its expression profile is sensitive to one or more stimuli; 2) its 
expression profile exhibits a large dynamic range in response to one or more stimuli; 3) 
its expression profile exhibits a rapid kinetic response to one or more stimuli; 4) its 
expression profile is specific to a known biological pathway or a common set of 

20 biological reactions or functions; 5) the regulon indicator gene does not contain 

sequences that are problematic for maintaining on plasmids when introduced into host 
cells. Most preferably, their expression is relatively specific for a particular 
biochemical pathway or cellular condition, highly sensitive to small changes in 
activation of a biochemical pathway or cellular condition and exhibit a wide dynamic 

25 range of expression so that the RIG is easier to assay. 

A "large dynamic range" is one in which the response in gene 
expression in response to a stimulus is at least four-fold over basal levels of expression 
in the absence of the stimulus. A response may be either an increase or a decrease in 
gene expression. In a preferred embodiment, the response is at least ten-fold over 

30 basal levels. In a more preferred embodiment, the response is at least twenty-fold over 
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basal levels. In an even more preferred embodiment, the response is at least 1 00-fold 
over basal levels. 

A "rapid kinetic response" is one in which the response occurs in the 
same time period as the doubling time of the organism after stimulation with the 
5 stimulus. In a preferred embodiment, the response occurs less than 10 minutes. In a 
more preferred embodiment, the response occurs in less than one minute. 

A "stimulus" or "stimuli" is a chemical compound, a genetic mutation, 
or a change in the environment of the cell, including, without limitation, a change in 
pH, temperature, osmotic pressure, salinity, light, gas concentration or partial pressure 
10 (e.g. O2, CO2, CO or NO). 

In order to determine whether a potential RIG is specific for a particular 
biochemical pathway or cellular condition, expression of the potential RIG is examined 
under all conditions in the expression database. A desirable RIG is one whose 
expression is selectively induced or repressed by chemicals or mutations that are 

15 known to affect the process in question. Likewise, a desirable RIG's expression is not 
influenced by chemicals or mutations that are known not to affect the process in 
. question. This analysis provides information regarding whether the RIG participates in 
additional cellular processes or biochemical pathways. When a potential RIG is not a 
member of a target regulon, pathway or process, specificity is measured by analyzing 

20 expression under all conditions under which the potential RIG is activated or repressed 
to determine if similar conditions elicit similar responses. 

Most preferably, a single RIG may be identified to be highly specific to 
a particular pathway, i.e., wherein its expression changes only when a particular 
pathway is activated or repressed, but not when other pathways are likewise regulated. 

25 Such a highly specific regulon indicator gene cannot always be found for a pathway of 
•interest. In such cases, however, more than one RIG may be identified whose 
coordinate expression patterns correlate with high specificity to a pathway of interest. 
Preferably, the coordinate expression of two RIGs provides such specificity. However, 
the present invention is not limited by the number of RIGs identified and used 

30 simultaneously as regulated pathway indicators. Expression of each member of a 
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plurality of RIGs may indqpendently increase or decrease when the biological pathway 
of interest is activated or repressed. 

In order to detemune whether a potential RIG is highly indicative of 
activation of a particular pathway, the gene will be activated or repressed to an 

5 expression level at least 2-fold higher or lower (if the gene is repressed) than when the 
pathway is not activated. In a preferred embodiment, the gene is activated or 
repressed to an expression level at least 10-fold higher or lower than the unactivated 
pathway. In a more preferred embodiment, the gene is activated or repressed to an 
expression level at least 20-fold higher or lower than the unactivated pathway. The 

10 expression level may be represented as a natural log ratio of treated/untreated 

expression values. See Figure 37, for example. In a preferred embodiment, the natural 
log ratio of a RIG is greater than 1, more preferably greater than 2.5, and even more 
preferably greater than 4.0 when the pathway or process is activated. 

In order to determine the dynamic range of a potential RIG, the 

1 5 expression of the RIG is assessed by examining its expression in response to all the 
treatments and mutations in the da,tabase. In a preferred embodiment, there is a high 
level of change in RIG expression for small changes in activation of the pathway. 

In one embodiment of the invention, expression of a regulon indicator 
gene correlates with the expression of at least one known gene in a group of 

20 coordinate^ expressed genes or provide a measure of the function of a biological 
process of interest. The RIG is identified by a method comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 

b) identifying based on their relative expression correlation 
25 coefficients a set of genes that are coordinately expressed; 

c) selecting a set of genes from b) which comprises one or more 
genes known to function in a particular biological pathway, or a 
common set of biological reactions or functions; 

d) selecting a member of the set of c) having one or more of the 

30 following characteristics: 
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1 ) its expression profile is sensitive to one or more stimuli; 

2) its expression profile exhibits a large dynamic range in 
response to one or more stimuli; 

3) its expression profile exhibits a rapid kinetic response to 
one or more stimuli; 

4) its expression profile is specific to a known biological 
pathway or a common set of biological reactions or 
fimctions; 

5) the regulon indicator gene does not contain sequences 
that are problematic for maintaining on plasmids when 
introduced into host cells. 

The RIG may also be co-regulated with one or more genes in the group 
of coordinately expressed genes of c) above. In addition, the RIG may control the 
expression of at least one other gene in the group of coordinately expressed genes of c) 
above. The RIG may be a gene of previously unknown fiinction. 

In another embodiment, the invention provides a method for identifying 
a regulon indicator gene in a database of compiled gene expression profiles, wherein 
expression of the regulon indicator gene provides a measure of the fiinction of a 
biological pathway or process of interest. The method comprises the steps of: . 

a) examining exemplary expression profiles in response to one or 
more chemical or genetic treatments which target the pathway or process of interest to 
generate reporter sensitivity data; 

b) selecting a set of genes fi'om a) which comprises one or more 
genes most significantly affected in response to the treatment or treatments; and 

c) selecting at least one gene firom b) whose expression profile is 
maximized for its specificity and sensitivity to the treatment or class of treatments in a) 
compared to its sensitivity to all other treatments in the database. 

The regulon indicator gene may be co-regulated with one or more 
genes in the set of genes of a) or the regulon indicator gene, upon expression, controls 
the expression of at least one other gene in the in the set of genes of a). 
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Methods to Identify Potential Target Genes and Targets 

A regulon is identified as described above under "Methods to Identify 
Potential RIGs." In a preferred embodiment, a regulon will contain both characterized 
and uncharacterized genes. In many cases, the characterized genes will have a 
5 common function or will be part of the same biochemical pathway. For instance, a 
regulon of the isoprenoid pathway will contain characterized genes involved in sterol 
biosynthesis. Uncharacterized genes will then be analyzed in terms of whether they are 
likely to be part of the same biochemical pathway as the characterized genes. The 
sequence of uncharacterized genes will be compared to the sequence of genes of 

10 known function to determine if the uncharacterized genes or their gene products have 
any motifs common to characterized genes. 

For instance, uncharacterized genes will be examined for domains 
indicating enzymatic functions, including, without limitation, kinase, protease and 
phosphorylase activities. Similarly, uncharacterized genes will be examined for 

1 5 domains indicating that they might be transcription factors, including, without 

limitation, zinc finger, PHD, steroid-binding and helix-loop-helix regions. Other 
domains of interest include lipid-binding and ATP-binding domains. Uncharacterized 
genes will also be examined for sequence similarities to secreted factors and receptors. 
In a preferred embodiment, target genes and their encoded target proteins are 

20 previously uncharacterized, highly correlated with a particular regulon containing 
genes for a specific pathway or process, and that appear to be an enzyme, secreted 
factor, receptor or transcription factor. 

In a preferred embodiment, a novel regulon target gene may be selected 
from a database of compiled gene expression profiles. The target gene is selected 

25 comprising the steps of: 

a) ^ comparing gene expression profiles of a plurality of genes in the 

database to generate expression correlation coefficients; 

b) identifying based on their expression correlation coefficients a 
set of genes that are coordinately expressed; 
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c) selecting from b) a set of genes comprising one or more genes 
of unknown fiinction and one or more genes known to function 
in a particular biological pathway, or a common set of biological 
reactions or functions of interest; 
S d) selecting from the set of c) at least one gene of unknown 

function. Gene X, as a novel regulon target gene; wherein Gene 
X is a gene whose expression profile closely correlates to the 
expression profiles of the one or more genes of the set of c) 
known to function in the particular biological pathway, or 
1 0 common set of biological reactions or functions of interest. 

The method may further comprise the step of generating individual 
correlation coefficients between the gene expression profile of Gene X and a plurality 
of genes in the database to assess the selectivity of Gene X as a novel regulon target 
gene. The method may further comprise the step of determining whether the protein 
IS encoded by Gene X exhibits substantial homology to a human, non-human mammal, 

avian, amphibian, fish, insect or plant protein, including, without limitation, the step of 
hybridizing Gene X to genomic DNA from human, non-human mammal, avian, 
amphibian, fish, insect or plant cells or tissue under low stringency conditions, 
comparing the DNA sequence of Gene X to the DNA sequences from other organisms, 
20 or obtaining an amino acid sequence encoded by Gene X and comparing it to amino 
acid sequences from other organisms. The DNA or amino acid sequences from other 
organisms may be contained within a database and the DNA or amino acid sequence 
encoded by Gene X may compared to the DNA or amino acid sequences from other 
organisms using a computer algorithm such as blastp, tblastn or another algorithm that 
25 utilizes string alignments. The method for identifying a target may further comprise 
the steps of: 

a) disrupting the function of Gene X or its homolog in a yeast cell; and 

b) identifying whether the function of Gene X is essential for yeast 
germination, vegetative growth, pseudohyphal or hyphal growth. 



39 



wo 00/58521 PCTAJSOO/08604 



In another embodiment of the invention, genes that are regulated by 
regulon target genes of yeast or its mammalian homolog may be identified. The 
method comprises the steps of 

a) overexpressing the target gene in host cells of a matrix comprising a 
plurality of units of cells, the cells in each unit containing a reporter 
gene operably linked to an expression control sequence derived from a 
gene of a selected organism; and 

b) identifying genes that are either induced or repressed by overexpression 
of the target gene. 

In a preferred embodiment, the target gene is selected from the group 
consisting of YMRI34w, YER034w, YJLlOSw, YKL077w, YGR046w, YJR041c, 
YER044C and YLRJOOw and their mammalian homologs. 

Methods for Constructing Mutant Yeast Strains 

Once a potential target has been identified, one may disrupt the gene to 
determine the effect of inhibiting the gene's activity has on the phenotype of the yeast 
cell. There are a number of methods well known in the art by which a person can 
disrupt a particular gene in yeast. One of skill in the art can disrupt an entire gene and 
create a null allele, in which no portion of the gene is expressed. One may also 
produce and express an allele comprising a portion of the gene which is not sufficient 
for gene function. This may be done by inserting a nonsense codon into the sequence 
of the gene such that translation of the mutant mRNA transcript ends prematurely. 
One may also produce and express alleles containing point mutations, individually or in 
combination, that reduce or abolish gene function. 

There are a number of different strategies for creating conditional 
alleles of genes. Broadly, an allele can be conditional for function or expression. An 
example of an allele that is conditional for function is a temperature sensitive mutation 
where the gene product is functional at one temperature but non-functional at another, 
e.g., due to misfolding or mislocalization. One of ordinary skill in the art may produce 
mutant alleles which may have only one or a few altered nucleotides but which encode 
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inactive or temperature-sensitive proteins. Temperature-sensitive mutant yeast strains 
express a functional protein at permissive temperatures but do not express a functional 
protein at non-permissive temperatures. 

An example of an allele that is conditional for expression is a chimeric 
gene where a regulated promoter controls the expression of the gene. Under one 
condition the gene is expressed and under another it is not. One may replace or alter 
the endogenous promoter of the gene with a heterologous or altered promoter that can 
be activated only under certain conditions. These conditional mutants only express the 
gene under defined experimental conditions. In a preferred embodiment, the gene is 
under the control of a regulated promoter where the gene may be expressed at higher 
or lower levels depending upon the degree of activation of the promoter. For instance, 
a gene under the control of a regulated promoter may be expressed at any level 
between 0 and 100% of wild type expression, such as at 10%, 20%, 50% or 80% of its 
wild type level The gene may also be expressed at levels above its usual wild type 
expression (overexpression). All of these methods are well known in the art. For 
example, see Stark (1998), Garfinkel et al., (1998), and Lawrence and Rothstein, 
( 1 99 1 ), herein incorporated by reference. 

One having ordinary skill in the art also may decrease expression of a 
gene without disrupting or mutating the gene. For instance, one may decrease the 
expression of a gene by transforming yeast with an antisense molecule or ribozyme 
under the control of a regulated or constitutive promoter (see Nasr et al., 1995, herein 
mcorporated by reference). One may introduce an antisense construct operably linked 
to an inducible promoter into S. cerevisiae to study the function of a conditional allele 
(see Nasr et al. supra). One problem that may be encountered, however, is that many 
antisense molecules do not work well in yeast, for reasons that are, as yet, unclear (see 
Atkins et al., 1994 and Olsson et al., 1997). 

One may also decrease gene expression by inserting a sequence by 
homologous recombination into or next to the gene of interest wherein the sequence 
targets the mRNA or the protein for degradation. For instance, one can introduce a 
construct that encodes ubiquitin such that a ubiquitin fusion protein is produced. This 
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protein will be likely to have a shorter half-life than the wildtype protein. See, e.g., 
Johnson et al. (1992), herein incorporated by reference. 

In a preferred mode, a gene of interest is completely disrupted in order 
to ensure that there is no residual function of the gene. One can disrupt a gene by 
5 "classical" or PCR-based methods. The "classical" method of gene knockout is 
described by Rothstein (1991), herein incorporated by ref«-ence. However, it is 
preferable to use a PCR-based deletion method because it is faster and less labor 
intensive. 

A preferred method to delete a gene is a one-step, polymerase chain 

10 reaction (PGR) based gene deletion method (Rothstein, 1991). Gene specific primer 
pairs are designed for PGR amplification of the plasmid pFA6a-KanMX4 (Wach et al., 
1994), which teachings are herein incorporated by reference. The 3' ends of the 
upstream and downstream gene specific primers have been designed to include 1 8 
basepairs (bp) and 19 bp, respectively, of nucleotide homology flanking the KanMX 

1 5 gene of the plasmid pF A6a-KanMX4 template. All of the gene specific primer pairs 
contain these complementary sequences, such that the same plasmid pFA6a-KanMX4 
template can be used for all of the first round PGR reactions. At their 5* ends, the 
primers each have gene specific sequence homologies. The upstream primer contains a 
nucleotide sequence which includes the start codon of the gene to be knocked out and 

20 the sequence immediately upstream of the start codon. The downstream primer 
contains a nucleotide sequence which includes the stop codon of the gene and the 
sequence immediately downstream of the stop codon. For each set of primers, the 
sequences of the gene are derived from the 5* and 3* ends of the target DNA sequence. 

The upstream and downstream primers are then used to amplify the 

25 pFA6a-KanMX4 by PGR using standard conditions for PGR. Hybridization conditions 
for specific gene-specific primers can be experimentally determined, or estimated by a 
number of formulas. One such formula is T„ = 81.5 + 16.6 (logio[Na^]) + 0.41 
(fi-action G + C) - (600/N). See Sambrooketal. pages 11.46-11.47. The products of 
the first round PGR reactions are DNA molecules containing the KanMX marker 
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(conferring resistance to the drug G-418 in S. cerevisiae) flanked on both ends by 18 
bp of gene specific sequences. 

The gene specific flanking sequences are extended during the second 
round PGR reactions. The sequences of the two gene specific PGR primers are 
5 derived from the 45 bp inunediately upstream (including the start codon) and the 45 bp 
immediately downstream (including the stop codon) of each gene. Thus, following the 
second round of PGR the produa contains the KanMX marker flanked by 45 bp of 
gene specific sequences corresponding to the sequences flanking the gene's ORF. The 
PGR products are purified by an isopropanol precipitation, and shipped with the 
10 analytical primers (see below) to the consortium members on dry ice. The precipitated 
PGR products are resuspended in TE buffer (10 mM Tris-HGl [pH 7.6], 1 mM 
EDTA). 

The various mutations are constructed in two related Saccharomyces 
cerevisiae strains, BY4741 {MATa his3AJ leu2A0 met! SAO uraSAO) and BY4743 

15 {h4ATii/MATahis3AI/his3Al leu2A0/leu2A0LYS2/ly^^ 

uraSAO/uraSAO) (Brachmann et al., 1998). Both of these strains are transformed with 
the PGR products by the lithium acetate method as described by Ito et al., 1983, and 
Schiestl and Gietz, 1989, herein incorporated by reference. The flanking, gene- 
specific yeast sequences target the integration event by homologous recombination to 

20 the desired locus (Figure 1). Transfprmants are selected on rich medium (YPD) which 
contains G-418 (Geneticin, Life Technologies, Inc.) as described by Guthrie and Fink, 
1 99 K herein incorporated by reference. Ideally, independent mutations are isolated in 
the haploid (BY4741) and the diploid (BY4743) strains. The heterozygous mutant 
diploid strain is then sporulated, and subjected to tetrad analysis (Sherman, 1991; 

25 Sherman and Wakem, 1991, herein incorporated by reference). This allows for the 
isolation of the mutation in a MA Ta haploid strain. The two independently isolated 
MA Ta and A/i4 yor haploid strains are then mated to create a homozygous mutant 
diploid strain. 
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Methods to Characterize Yeast Gene Function 

One of skill in the art will recognize that a number of methods can be 
used to characterize the function of a yeast gene. In general, the preferred strategy 
depends upon the assumptions made regarding the function of the gene. For example, 
if one creates a conditional allele of the gene, then one can engineer a mutant strain 
wherein the wildtype allele has been replaced by a conditional allele. See, e.g.. Stark 
( 1 998). The strain is constructed and propagated under the permissive condition, and 
then the strain is swhched to the non-permissive (or restrictive) condition and effects 
upon the cell's phenotype is monitored. This can be done in a haploid cell, or in a 
diploid cell as either a homozygous or heterozygous mutant. 

A preferred method of characterizing the function of a gene is to 
knockout the gene completely and then analyze the knockout yeast strain by tetrad 
analysis. This method is preferred because one does not need to be able to engineer a 
conditional allele. Furthermore, as the knockout is a null allele, one is assured that it is 
the null phenotype that is assessed, rather than a phenotype resulting from a potentially 
hypomorphic conditional allele. In addition, a complete knockout of the gene can be 
constructed in a diploid strain where the potentially essential function of the gene is 
complemented by the second copy of the gene. 

Once the knockout has been constructed as a heterozygous mutant, the 
effects of the mutation is assessed in the haploid spores. Tetrad analysis of the haploid 
spores allows for the genetic characterization of a mutation because one can determine 
the effect of the homo2ygous gene linked to the knockout marker (G-418 resistance). 

Any of a number of different tests can be performed to determine the 
effect of knockmg out the selected target gene. For instance, one can determine 
whether the yeast cell is more or less responsive to various pharmaceutical compounds 
(e.g., see Figure 4), pH, saUnity, osmotic pressure, temperature or nutritional 
conditions. One can determine whether the knockout results in a different observable 
phenotype (e.g., see Figure 22). In addition, yeast cells can be tested for their ability 
to mate, sporulate and bud relative to a wild type control. Thus, these tests may 
provide important information regarding the function of the target gene. 
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Methods to Identify Potential Homologs in Other Organisms 

Once a gene has been identified as a potential target, one can determine 
whether the gene from yeast has homologs in other organisms, such as humans, non- 
human mammals, other vertebrates such as fish, insects, plants, or other fimgi. 

5 One method of determining whether an S, cerevisiae gene has 

homologs is by the use of low stringency hybridization and washing. In general, 
genomic DNA or cDNA libraries can be screened using probes derived fi-om the target 
S, cerevisiae gene using methods known in the art. See above and pages 8.46-8.49 
and 9.46-9.58 of Sambrook et al., 1989, herein incorporated by reference. Preferably, 

10 genomic DNA libraries are screened because cDNA libraries generally will not contain 
all the mRNA species an organism can make. Genomic DNA libraries fi"6m a variety 
of different organisms, such as plants, fiingi, insects, and various mammalian species 
are commercially avmlable and can be screened. This method is usefiil for determining 
whether there are homologs in organisms whose DNA sequences have not been 

1 5 characterized extensively. 

A second method of determining whether an 5. cerevisiae gene has 
homologs is through the use of degenerate PCR. In this method, degenerate 
oligonucleotides that encode short amino acid sequences of the S, cerevisiae gene are 
made. Methods of preparing degenerate oligonucleotides and using them in PCR to 

20 isolate uncloned genes are well known in the art (see Sambrook, pages 14.7-14.8, and 
Crawley et al., 1997, pages 4.2. 1-4.2.5, herein incorporated by reference). 

The most preferred method is to compare the sequence of the S. 
cerevisiae gene to sequences firom other organism. Either the nucleotide sequence of 
the gene or its encoded amino acid sequence is compared to the sequences fi"om other 

25 organisms. Preferably, the encoded amino acid sequence of the yeast gene is compared 
to amino acid sequences firom other organisms. The sequence of the yeast gene can be 
compared by a number of different algorithms well known in the art. In general, 
computer programs designed for sequence analysis are used for the purpose of 
comparing the sequence of interest to a large database of other sequences. Any 

30 computer program designed for the purpose of sequence comparison can be used in 
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this method. Some computer programs, such as Fasta, produce results that are 
typically presented as "% sequence identity." Other computer programs, such as 
blastp, produce results presented as "p-values." Preferably, the target gene sequence 
will be compared to other sequences using the blastp algorithm. 
5 Nucleotide and amino acid sequences of target genes may be compared 

to vertebrate sequences, including human and non-human mammalian sequences, as 
well as plant and insect sequences using any one of the large number of programs 
known in the art for comparing nucleotide and amino acid sequences to sequences in a 
database. Examples of such programs are Fasta and blastp, discussed above. 
10 Examples of databases which can be searched include GenBank-EMBL, SwissProt, 

DDBJ, GeneSeq, and EST databases, as well as databases containing combinations of 
these databases. 

As a further characterization, any potential homologs from other 
organisms can be assessed for their ability to functionally complement the yeast 

1 5 mutant. This can be achieved by first cloning the homolog into a cerevisiae 

expression vector by standard methods. This plasmid can then be transformed into the 
heterozygous mutant diploid strain. Upon sporulation and tetrad dissection the ability 
of the homolog to complement the yeast function is determined by whether or not the 
haploid spores complements the yeast knockout and restores the wildtype function of 

20 the haploid spore. The ability of the homolog to complement the yeast mutant would 
indicate shared function(s) and suggest that the homolog may be part of a similar 
pathway in the other organism. 

Nucleic Acids, Vectors and Production of Recombinant Polypeptides 

The present invention provides nucleic acids and recombinant DNA 
25 vectors which comprise 5. cerevisiae RIG and target gene DNA sequences. 

Specifically, vectors comprising all or portions of the DNA sequence of HESl, 
YMRJ34W, YER034W, YJLIOSw, YKL077w, YGR046w, YJR041c, YER044c and 
YLRI OOw are provided. The vectors of this invention also include those comprising 
DNA sequences which hybridize under stringent conditions to the HESI, YMRJ34w, 
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YER034W, YJL105W, YKL077w, YGR046w, YJR04Jc, YER044c and YLRJOOw gene 
sequences, and conservatively modified variations thereof. 

The nucleic acids of this invention include single-stranded and double- 
stranded DNA, RNA, oligonucleotides, antisense molecules, or hybrids thereof and 
5 may be isolated from biological sources or synthesized chemically or by recombinant 
DNA methodology. The nucleic acids, recombinant DNA molecules and vectors of 
this invention may be present in transformed or transfected cells, cell lysates, or in 
partially purified or substantially pure forms, 

DNA sequences may be expressed by operatively linking them to an 

10 expression control sequence in an appropriate expression vector and employing that 
expression vector to transform an appropriate unicellular host. Expression control 
sequences are sequences which control the transcription, post-transcriptional events 
and translation of DNA sequences. Such operative linking of a DNA sequence of this 
invention to an expression control sequence, of course, includes, if not already part of 

15 the DNA sequence, the provision of a translation initiation codon, ATG, in the correct 
reading fi-ame upstream of the DNA sequence. 

A wide, variety of host/expression vector combinations may be 
employed in expressing the DNA sequences of this invention. Usefiil expression 
vectors, for example, may consist of segments of chromosomal, non-chromosomal and 

20 synthetic DNA sequences. 

Usefiil expression vectors for bacterial hosts include bacterial plasmids, 
such as those from £. coli, including pBluescript, pGEX-2T, pUC vectors, col El, 
pCRl , pBR322, pMB9 and their derivatives, wider host range plasmids, such as RP4, 
phage DNAs, e.g., the numerous derivatives of phage lambda, e.g., NM989, AGTIO 

25 and AGTl 1, and other phages, e.g., Ml 3 and filamentous single stranded phage DNA, 
In yeast, vectors include Yeast Integrating plasmids (e.g., YlpS) and Yeast Replicating 
plasmids (the YRp and YEp series plasmids). Yeast centromere plasmids (the YCp 
series plasmids), pGPD-2, 2ji plasmids and derivatives thereof, and improved shuttle 
vectors such as those described in Gietz and Sugino, Gene. 74, pp. 527-34 (1 988) 

30 (YIplac, YEplac and YCplac). Expression in mammalian cells can be achieved using a 



47 



wo 00/58521 



PCT/USOO/08604 



variety of plasmids, including pSV2, pBC12BI, and p91023, as well as lytic virus 
vectors {e.g,, vaccinia virus, adeno virus, and baculovirus), episomal virus vectors 
(e.g., bovine papillomavirus), and retroviral vectors ie.g,, murine retroviruses). Useful 
vectors for insect cells include baculoviral vectors and pVL 941 . 

In addition, any of a wide variety of expression control sequences - 
sequences that control the expression of a DNA sequence when operatively linked to 
it - may be used in these vectors to express the DNA sequences of this invention. 
Such useful expression control sequences include the expression control sequences 
associated with structural genes of the foregoing expression vectors. Expression 
control sequences that control transcription include, e.g., promoters, enhancers and 
transcription termination sites. Expression control sequences that control post- 
transcriptional events include splice donor and acceptor sites and sequences that 
modify the half-life of the transcribed RNA, e.g., sequences that direct poly(A) 
addition or binding sites for RNA-binding proteins. Expression control sequences that 
control translation include ribosome binding sites, sequences which direct expression 
of the polypeptide to particular cellular compartments, and sequences in the 5' and 3' 
untranslated regions that modify the rate or efficiency of translation. 

Examples of useful expression control sequences include, for example, 
the early and late promoters of SV40 or adenovirus, the lac system, the trp system, the 
TAG or TRC system, the T3 and T7 promoters, the major operator and promoter 
regions of phage lambda, the control regions of fd coat protein, the promoter for 3- 
phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid 
phosphatase, e.g., Pho5, the promoters of the yeast a-mating system, the GALl or 
GAL 10 promoters, and other constitutive and inducible promoter sequences known to 
control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and 
various combinations thereof See, e.g.. The Molecular Biology of the Yeast 
Saccharomvces (eds. Strathem, Jones and Broach) Cold Spring Harbor Lab., Cold 
Spring Harbor, N.Y, for details on yeast molecular biology in general and on yeast 
expression systems (pp. 181-209) (incorporated herein by reference)). 
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DNA vector design for transfection into mammalian cells should include 
appropriate sequences to promote expression of the gene of interest, including: 
appropriate transcription initiation, termination and enhancer sequences; efficient RNA 
processing signals such as splicing and polyadenylation signals; sequences that stabilize 
cytoplasmic mRNA; sequences that enhance translation eflSciency (i.e., Kozak 
consensus sequence); sequences that enhance protein stability; and when desired, 
sequences that enhance protein secretion. A great number of expression control 
sequences - constitutive, inducible and/or tissue-specific - are known in the art and 
may be utilized. For eukaryotic ceCs, expression control sequences typically include a 
promoter, an enhancer derived from immunoglobulin genes, S V40, cytomegalovirus, 
etc., and a polyadenylation sequence which may include splice donor and acceptor 
sites. Substantial progress in the development of mammalian cell expression systems 
has been made in the last decade and many aspects of the system are well 
characterized. 

Preferred DNA vectors also include a marker gene and means for 
amplifying the copy number of the gene of interest. DNA vectors may also comprise 
stabilizing sequences (e.g., on- or ARS-like sequences and telomere-like sequences), 
or may alternatively be designed to favor directed or non-directed integration.into the 
host cell genome. In a preferred embodiment, DNA sequences of this invention are 
inserted in frame into an expression vector that allows high level expression of an RNA 
which encodes a fusion protein comprising encoded DNA sequence of interest. 

Of course, not all vectors and expression control sequences will 
function equally well to express tiie DNA sequences of this invention. Neither will all 
hosts function equally well with the same expression system. However, one of skill in 
the art may make a selection among these vectors, expression control sequences and 
hosts without undue experimentation and without departing from the scope of this 
invention. For example, in selecting a vector, the host must be considered because the 
vector must be replicated in it. The vector's copy number, the ability to control that 
copy number. the ability to control integration, if any, and tiie expression of any other 
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proteins encoded by the vector, such as antibiotic or other selection markers, should 
also be considered. 

In selecting an expression control sequence, a variety of factors should 
also be considered. These include, for example, the relative strength of the sequence, 

5 its controllability, and its compatibility with the DNA sequence of this invention, 

particularly with regard to potential secondary structures. Unicellular hosts should be 
selected by consideration of their compatibility with the chosen vector, the toxicity of 
the product coded for by the DNA sequences of this invention, their secretion 
characteristics, their ability to fold the polypeptide correctly, their fermentation or 

10 culture requirements, and the ease of purification from them of the products coded for 
by the DNA sequences of this invention. 

Within these parameters, one of skill in the art may select various 
vector/expression control sequence/host combinations that will express the DNA 
sequences of this invention in fermentation or in other large scale cultures. 

15 Given the strategies described herein, one of skill in the art can 

construct a variety of vectors and nucleic acid molecules comprising functionally 
equivalent nucleic acids. DNA cloning and sequencing methods are well known to 
those of skill in the art and are described in an assortment of laboratory manuals, 
including Sambrook et al, supra. 1989; and Ausubel et al., 1994 Supplement. Product 

20 information from manufacturers of biological, chemical and immunological reagents 
also provide useful information. 

The recombinant DNA molecules and more particularly, the expression 
vectors of this invention may be used to express the RIG and target genes from S. 
cerevisiae as recombinant polypeptides in a heterologous host cell. The polypeptides 

25 of this invention may be full-length or less than full-length polypeptide fragments 

recombinantly expressed fi'om the DNA sequences according to this invention. Such 
polypeptides include variants and muteins having biological activity. The polypeptides 
of this invention may be soluble, or may be engineered to be membrane- or substrate- 
bound using techniques well known in the art. 
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Particular details of the transfection, expression and purification of 
recombinant proteins are well documented and are understood by those of skill in the 
art. Further details on the various technical aspects of each of the steps used in 
recombinant production of foreign genes in mammalian cell expression systems can be 
5 found in a number of texts and laboratory manuals in the art. See, e.g., Ausubel et al., 
1989, herein incorporated by reference. 

Transformation and other methods of introducing nucleic acids into a 
host cell (e.g., transfection, electroporation, liposome delivery, membrane fiision 
techniques, high velocity DNA-coated pellets, viral infection and protoplast fusion) can 

10 be accomplished by a variety of methods which are well known in the art (see, for 

instance, Ausubel, supra, and Sambrook, supra). Bacterial, yeast, plant or mammalian 
cells are transformed or transfected with an expression vector, such as a plasmid, a 
cosmid, or the like, wherein the expression vector comprises the DNA of interest. 
Alternatively, the cells may be infected by a viral expression vector comprising the 

15 DNA or RNA of interest. Depending upon the host cell, vector, and method of 
transformation used, transient or stable expression of the polypeptide will be 
constitutive or inducible. One having ordinary skill in the art will be able to decide 
whether to express a polypeptide transiently or stably, and whether to express the 
protein constitutively or inducibly. 

20 A wide variety of unicellular host cells are useful in expressing the DNA 

sequences of this invention. These hosts may include well known eukaryotic and 
prokaryotic hosts, such as strains of £. co//, Pseudomonas, Bacillus, Streptomyces, 
fungi, yeast, insect cells such as Spodoptera frugiperda (SF9), animal cells such as 
CHO, BHK, MDCK and various murine cells, e.g., 3T3 and WEHI cells, Afiican green 

25 monkey cells such as COS 1, COS 7, BSC 1, BSC 40, and BMT 10, and human cells 
such as VERO, WI38, and HeLa ceUs, as well as plant cells in tissue culture. 

Expression of recombinant DNA molecules according to this invention 
may involve post-translational modification of a resultant polypeptide by the host cell. 
For example, in mammalian cells expression might include, among other things, 

30 glycosylation, lipidation or phosphorylation of a polypeptide, or cleavage of a signal 
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sequence to produce a '"mature" protein. Accordingly, the polypeptide expression 
products of this invention encompass full-length polypeptides and modifications or 
derivatives thereof, such as glycosylated versions of such polypeptides, mature proteins 
and polypeptides retaining a signal peptide. The present invention also provides for 
S biologically active fi'agments of the polypeptides. Sequence analysis or genetic 

manipulation may identify those domains responsible for the function of the protein in 
yeast. Thus, the invention encompasses the production of biologically active 
fragments. The invention also encompasses fragments of the polypeptides which 
would be valuable as antigens for the production of antibodies, or as competitors for 

10 antibody binding. 

The polypeptides of this invention may be fiised to other molecules, 
such as genetic, enzymatic or chemical or immunological markers such as epitope tags. 
Fusion partners include, inter alia^ myc, hemagglutinin (HA), GST, immunoglobulins, 
P-galactosidase, biotin trpE, protein A, P-lactamase, a amylase, maltose binding 

IS protein, alcohol dehydrogenase, polyhistidine (for example, six histidine at the amino 
and/or carboxyl terminus of the polypeptide), lacZ, green fluorescent protein (GFP), 
yeast a mating factor, GAL4 transcription activation or DNA binding domain, 
luciferase, and serum proteins such as ovalbumin, albumin and the constant domain of 
IgG. See, e.g., Godowski et al., 1988, and Ausubel et al., supra. Fusion proteins may 

20 also contain sites for specific enzymatic cleavage, such as a site that is recognized by 
enzymes such as Factor XIII, trypsin, pepsin, or any other enzyme known in the art. 
Fusion proteins will typically be made by either recombinant nucleic acid methods, as 
described above, chemically synthesized using techniques such as those described in 
Merrifield, 1963, herein incorporated by reference, or produced by chemical cross- 

25 linking. 

Tagged fusion proteins permit easy localization, screening and specific 
binding via the epitope or enzyme tag. See Ausubel, 1991, Chapter 16. Some tags 
allow the protein of interest to be displayed on the surface of a phagemid, such as 
Ml 3, which is useful for panning agents that may bind to the desired protein targets. 

% 

t 
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Thus, fusion proteins are useful for screening potential agents using the proteins 
encoded by the target genes. 

One advantage of fusion proteins is that an epitope or enzyme tag can 
simplify purification. These fusion proteins may be purified, often in a single step, by 
affinity chromatography. For example, a His*^ tagged protein can be purified on a Ni 
affinity column and a GST fusion protein can be purified on a glutathione affinity 
column. Similarly, a fusion protein comprising the Fc domain of IgG can be purified 
on a Protein A or Protein G column and a fusion protein comprising an epitope tag 
such as myc can be purified using an immunoaffinity column containing an anti-c-myc 
antibody. It is preferable that the epitope tag be separated from the protein encoded by 
the target gene by an enzymatic cleavage site that can be cleaved after purification. 
A second advantage of fusion proteins is that the epitope tag can be used to bind the 
fusion protein to a plate or column through an aflBnity linkage for screening targets. 

In addition, fiision proteins comprising the constant domain of IgG or 
other serum proteins can increase a protem's half-life in circulation for use 
therapeutically. Fusion proteins comprising a targeting domain can be used to direct 
the protein to a particular cellular compartment or tissue target in order to increase the 
efficacy of the fiinctional domain. See, e.g., U.S. Pat. No. 5,668,255, which discloses 
a fusion protein containing a domain which binds to an animal cell coupled to a 
translocation domain of a toxin protein. Fusion proteins may also be useful for 
improving antigenicity of a protein target. Examples of making and using fusion 
proteins are found in U.S. Pat. Nos. 5,225,538, 5,821,047, and 5,783,398, which are 
hereby incorporated by reference. 

Production of Polypeptide Fragments, Derivatives and Muteins and Biological 
Assays Thereof 

Fragments, derivatives and muteins of polypeptides encoded by the RIG 
and target genes can be produced recombinantly or chemically, as discussed above. 
One can produce fi-agments of a polypeptide encoding a target gene by truncating the 
DNA encoding the target gene and then expressing it recombinantly. Alternatively, 



53 



wo 00/58521 



PCT/USOO/08604 



one can produce a fragment by chemically synthesizing a portion of the full-length 
polypeptide. One may also produce a fragment by enzymatically cleaving the 
polypeptide. Methods of producing polypeptide fragments are well-known in the art 
(see, e.g., Sambrook et al. and Ausubel et al. supra). 

One may produce muteins of a polypeptide encoded by a target gene by 
introducing mutations mto the DNA sequence of the gene and then expressing it 
recombinantly. These mutations may be targeted, in which particular encoded amino 
acids are altered, or may be untargeted, in which random encoded amino acids within 
the polypeptide are altered. Muteins with random amino add alterations can be 
screened for a particular biological activity. Methods of producing muteins with 
targeted or random amino acid alterations are well known in the art, see e.g., 
Sambrook et al., Ausubel et al, supra, and U.S. Pat, No. 5,223,408, herein 
incorporated by reference. Production of polypeptide derivatives are well known in 
the art, see above. 

There are a number of methods known in the art to determine whether 
fragments, muteins and derivatives of polypeptides encoded by a target gene has the 
same, enhanced or decreased biological activity as the wild type polypeptide. One of 
the simplest assays involves determining whether the fragment, mutein or derivative 
can complement the gene function in a cell which does not contain the target gene. 
For instance, one can introduce a DNA encoding a fragment or mutein of a 
polypeptide encoded by a gene into a mutant yeast strain which has the gene of interest 
deleted (see above under "Methods of Producing Mutant Yeast Strains"). If 
introduction of the DNA encoding the fragment or mutein permits the mutant yeast 
strain to regain its wildtype phenotype, then the fragment or mutein is biologically 
active, and complements the deleted gene. 

In one type of screening assay, the target gene or a fragment thereof 
can be used as the "bait" in a two-hybrid screen to identify molecules that physically 
interact with the target gene. See Chien et al. (1991). 

In addition, one may generate genome expression profiles of yeast 
strains to characterize the gene's function. In order to generate such profiles, a non- 



54 



wo 00/58521 



PCTAJSOO/08604 



functional or conditional allele of the gene in a yeast strain must be produced. The 
conditional or non-fiinctional allele may be constructed by any technique known in the 
art, including deleting the gene as described above, making a temperature-sensitive 
allele of the gene or operably linking the gene to an inducible promoter for regulated 

5 expression. If the yeast strain contmns a non-functional allele, a genome expression 
profile of the mutant strain is compared to a wild type strain. If the yeast strain 
contains a conditional allele, the yeast strain is first grown under the permissive 
condition to permit expression of the functional product of the targetl gene. Then, the 
yeast strain is shifted to the nonpermissive condition, in which the product of the target 

10 gene is not made or is non-functional. The genome expression profile of the yeast 
strain under the nonpermissive condition may be compared to the same yeast strain 
grown under permissive conditions or a wildtype yeast strain. Structure-function 
studies can be performed wherein a library of mutant forms of the gene is screened for 
the ability to complement the knock-out mutant strain. 

15 Fragments, muteins and derivatives may also be micro-injected into a 

mutant yeast strain in which the gene of interest is deleted to determine whether the 
introduction of the fragment, mutein or derivative can complement the genetic defect. 
Similarly, firagments, muteins and derivatives may be microinjected into other cell types 
in which the homologous gene has been deleted. 

20 Finally, if a particular biochemical activity of a polypeptide encoded by 

a target gene is known, this activity can be measured for fragments, muteins or 
derivatives of the polypeptide. For instance, if a target gene encodes a kinase, one 
could measure the kinase acti>aty of the wild type polypeptide and compare it to the 
activity of a fragment, mutem or derivative. 

25 

Production of Antibodies 

The polypeptides encoded by the target genes of this invention may be 
used to eticit polyclonal or monoclonal antibodies which bind to the target gene 
product or a homolog from another species using a variety of techniques well known 
30 to those of skill in the art. Alternatively, peptides corresponding to specific regions of 
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the polypeptide encoded by the target gene may be synthesized and used to create 
immunological reagents according to well known methods. 

Antibodies directed against the polypeptides of this invention are 
immunoglobulin molecules or portions thereof that are inununologically reactive with 
the polypeptide of the present invention. It should be understood that the antibodies of 
this invention include antibodies immunologically reactive with fusion proteins. 

Antibodies directed against a polypeptide encoded by a target gene may 
be generated by immunization of a mammalian host. Such antibodies may be 
polyclonal or monoclonal. Preferably they are monoclonal- Methods to produce 
polyclonal and monoclonal antibodies are well known to those of skill in the art. For a 
review of such methods, see Harlow and Lane (1988), Yelton et al. (1981), and 
Ausubel et al. (1989) herein incorporated by reference. Determination of 
immunoreactivity with a polypeptide encoded by an target gene may be made by any of 
several methods well known in the art, including by immunoblot assay and ELISA. 

Monoclonal antibodies with affinities of 10"* M'^ or preferably 10"^ to 
10''** M'' or stronger are typically made by standard procedures as described, e.g., in 
Harlow and Lane, 1988 or Goding, 1986. Briefly, appropriate animals are selected and 
the desired immunization protocol followed. After the appropriate period of tinie, the 
spleens of such animals are excised and individual spleen cells fused, typically, to 
immortalized myeloma cells under appropriate selection conditions. Thereafter, the 
cells are clonally separated and the supematants of each clone tested for their 
production of an appropriate antibody specific for the desired region of the antigen. 

Other suitable techniques involve in vitro exposure of lymphocytes to 
the antigenic polypeptides, or alternatively, to selection of libraries of antibodies in 
phage or similar vectors. See Huse et al., 1989. The polypeptides and antibodies of 
the present invention may be used with or without modification. Frequently, 
polypeptides and antibodies will be labeled by joining, either covalently or 
non-covalently, a substance which provides for a detectable signal. A wide variety of 
labels and conjugation techniques are known and are reported extensively in both the 
scientific and patent literature. Suitable labels include radionuclides, enzymes, 
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substrates, cofactors, inhibitors, fluorescent agents, chemiiuniinescent agents, magnetic 
particles and the like. Patents teaching the use of such labels include U.S. Patents 
3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241, 
herein incorporated by reference. Also, recombinant inrmiunoglobulins may be 

5 produced (see U.S. Patent 4,816,567, herein incorporated by reference). 

An antibody of this invention may also be a hybrid molecule formed 
from immunoglobulin sequences from different species (e.g., mouse and human) or 
from portions of immunoglobulin light and heavy chain sequences from the same 
species. An antibody may be a single-chain antibody or a humanized antibody. It may 

10 be a molecule that has multiple binding specificities, such as a bifiinctional antibody 
prepared by any one of a number of techniques known to those of skill in the art 
including the production of hybrid hybridomas, disulfide exchange, chemical cross- 
linking, addition of peptide linkers between two monoclonal antibodies, the 
introduction of two sets of immunoglobulin heavy and light chains into a particular cell 

1 5 line, and so forth. 

The antibodies of this invention may also be human monoclonal 
antibodies, for example those produced by immortalized human cells, by SCID-hu mice 
or other non-human animals capable of producing "human" antibodies, or by the 
expression of cloned human immunoglobulin genes. The preparation of humanized 

20 antibodies is taught by U.S. Pat. Nos. 5,777,085 and 5,789,554, herein incorporated by 
reference. 

In sum, one of skill in the art, provided with the teachings of this 
invention, has available a variety of methods which may be used to alter the biological 
j3roperties of the antibodies of this invention including methods which would increase 
25 or decrease the stability or half-life, immunogenicity, toxicity, aflSnity or yield of a 

given antibody molecule, or to alter it in any other way that may render it more suitable 
for a particular application. 
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Therapeutic Methods Using Nucleic Acids Encoding Target Genes 

Orice a target gene has been identified in S. cerevisiae, the gene and its 
nucleotide sequence can be exploited in a number of ways depending upon the nature 
of the target gene. One method is to use the primary sequence of the target gene itself 

5 For instance, antisense oligonucleotides can be produced which are complementary to 
the mRN A of the target gene. Antisense oligonucleotides can be used to inhibit 
transcription or translation of a target yeast gene. Production of antisense 
oligonucleotides effective for therapeutic use is well-known in the art, see Agrawal et 
al., 1998, Lavrovsky et al., 1997, and Crooke, 1998, herein incorporated by reference. 

10 Antisense oligonucleotides are often produced using derivatized or modified 
nucleotides in order to increase half-life or bioavailability. 

The primary sequence of the target gene can also be used to design 
ribozymes that can target and cleave specific target gene sequences. There are a 
number of different types of ribozymes. Most synthetic ribozymes are generally 

15 hammerhead, Tetrahymena and hairpin ribozymes. Methods of designing and using 
ribozymes to cleave specific RNA species are known in the art, see Zhao et al., 1998, 
Larovsky et al., 1997, and Eckstein, 1997, herein incorporated by reference. Although 
hammerhead ribozymes are generally ineffective in yeast (Castanotto et al., 1998), 
other types of ribozymes may be effective in yeast, and hammerhead and other types of 

20 ribozymes are effective in other organisms. 

As discussed above, one can use target yeast genes to identify, 
homologous genes in plants and animals, including humans. Therefore, one can design 
ribozymes and antisense molecules to these genes firom plants and animals, including 
humans. 

25 Methods Using Neutralizing Antibodies to Proteins Encoded by Target Genes 

The protein encoded by the target gene can be used to elicit neutralizing 
antibodies for use as inhibit the function of the target protein. An antibody may be an 
especially good inhibitor if the target gene of interest encodes a protein which is 
expressed on the cell surface, such as an integral membrane protein. Although 
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polyclonal antibodies may be made, monoclonal antibodies are preferred. Monoclonal 
antibodies can be screened individually in order to isolate those that are neutralizing or 
inhibitory for the protein encoded by the target gene. Monoclonal antibodies also may 
be screened for inhibition of a particular function of a protein. For instance, if it is 
S known that the target gene in yeast encodes an enzyme, one can identify antibodies 
that inhibit the enzymatic activity. Alternatively, if the specific function of a target 
gene is unknovm, one can measure inhibition of the protein by determining the genonie 
expression profile for yeast cells contacted with the neutralizing antibody. Similarly, 
one can screen antibodies which are directed against animal, plant or human proteins 

10 for inhibition of the protein's activity in appropriate cells. 

Monoclonal antibodies which inhibit a target protein in vitro may be 
humanized for therapeutic use using methods well-known in the art, see, e.g., U.S. Pat. 
Nos. 5,777,085 and 5,789,554, herein incorporated by reference. Monoclonal 
antibodies may also be engineered as single-chain antibodies using methods well- 

15 known in the art for therapeutic use, see, e.g., U.S. Pat. Nos. 5,091,513, 5,587,418, 
and 5,608,039, herein incorporated by reference. 

Neutralizing antibodies may also be used diagnostically. For instance, 
the binding site of a neutralizing antibody to the protein encoded by the target gene can 
be used to help identify domains that are required for the protein's activity. The 

20 information about the critical domains of a target protein can be used to design 
inhibitors that bind to the critical domains of the target protein. In addition, 
neutralizing antibodies can be used to validate whether a potential inhibitor of an target 
protein inhibits the protein in in vitro assays. 

Methods of Identifying Functional Attributes of the Target 

25 Once a target gene in yeast is identified, the GRM (or an equivalent) is 

used to help identify critical functional attributes of the gene. In order to determine the 
particular transcripts a target gene modifies, one overexpresses the target gene in the 
cells of the GRM. One may also overexpress a conditional allele of the gene in the 
cells of the GRM. Then, one identifies a subset of genes that are either induced or 
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repressed by overexpression of the target gene. Methods for processing data using the 
GRM are also disclosed in United States Patents 5,569,588 and 5,777,888; see also 
United States Patent Application Serial No. 09/076,668, now pending. Once the genes 
that are regulated by a target gene are identified, one can use this information in a 
5 number of ways to identify potential inhibitors or activators of the target protein. 

Alternatively, one may determine the genome expression profile of a cell that has a 
mutation in a target gene, or a cell that has the endogenous target gene replaced either 
with an altered allele or with the counterpart gene from another species. Similarly, 
plant and animal GRMs, including human GRMs, overexpressing target genes can be 
10 used in the same way to identify potential inhibitors or activators of the target protein 
in these organisms. 

Another method for isolating a potential inhibitors or activators of a 
target gene is to use information obtained fi*om the "two-hybrid system" to identify and 
clone genes encoding proteins that interact with the polypeptide encoded by the target 
15 gene (see, e.g., Chien et al.,1991, incorporated herein by reference). The amino acid 
sequences of the polypeptides identified by the two-hybrid system can be used to 
design inhibitory peptides to the target protein. The "two-hybrid" system using 
libraries of the appropriate species can also be used to identify and clone genes 
encoding proteins that interact with the polypeptide encoded by the target genes. 

Methods of Using Target Proteins 

Recombinantly expressed target proteins or functional fi^agments 
thereof can be used to screen libraries of natural, semisynthetic or synthetic 
compounds. Particularly useful types of libraries include combinatorial small organic 
molecule libraries, phage display hbraries, and combinatorial peptide libraries. 
Methods of determining whether components of the library bind to a particular 
polypeptide are well known in the art. In general, the polypeptide target is attached to 
solid support surface by non-specific or specific binding. Specific binding can be 
accomplished using an antibody which recognizes the protein that is bound to a soUd 
support, such as a plate or column. Alternatively, specific binding may be through an 
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epitope tag, such as GST binding to a glutathione-coated solid support, or IgG fusion 
protein binding to a Protein A solid support. Alternatively, the recombinantly 
expressed protein or fragments thereof may be expressed on the surface of phage, such 
as Ml 3. A library in mobile phase is incubated under conditions to promote specific 

5 binding between the target and a compound. Compounds which bind to the target can 
then be identified. Alternately, the library is attached to a solid support and the 
polypeptide target is in the mobile phase. 

Binding between a compound and target can be determined by a 
number of methods. The binding can be identified by such techniques as competitive 

10 ELISAs or RIAs, for example, wherein the binding of a compound to a target will 

prevent an antibody to the target from binding. These methods are well-known in the 
art, see, e.g., Harlow and Lane, supra. Another method is to use BiaCORE 

« 

(BiaCORE) to measure interactions between a target and a compound using methods 

provided by the manufacturer. A preferred method is automated high throughput 
15 screening, see, e.g., Burbaum et al., 1997, and SchuUek et al., 1997, herein 

incorporated by reference. 

Once a compound that binds to a target is identified, one then 

determines whether the compound inhibits the activity of the target. If a biological 

function for the target protein is known, one could determine whether the compound 
20 inhibited the biological activity of the protein. For instance, if it is known that the 

target protein is an enzyme, one can measure the inhibition of enzymatic activity in the 

presence of the potential inhibitor. 

in a preferred embodiment, the target gene is selected fi-om YMRJ34w, 

YER034W, YJLIOSw, YKL077w, YGR046w, YJR041c, YER044c and YLRlOOw and 

25 their mammalian homologs. 

Another embodiment of the invention is to use the recombinantly 
expressed protein for rational drug design. The struaure of the recombinant protein 
may be determined using x-ray crystallography or nuclear magnetic resonance (NMR). 
Alternatively, one could use computer modeUng to determine the structure of the 

30 protein. The structure can be used m rational drug design to design potential inhibitory 
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compounds of the target (see, e.g., Clackson, Mattos et al., Hubbard, Cunningham et 
al., Kubinyi, Kleinberg et al., all herein incorporated by reference). 

In another embodiment, potential inhibitors of a regulon target gene can 
be identified by the following steps: 
5 a) creating a host ceil in which the target gene has been altered or 

inactivated by mutation; 

b) comparing gene expression profiles in the mutated host cell to those in 
a host cell which expresses the normal target gene; 

c) identifying one or more potential target-dependent reporter genes 

10 whose expression is altered in the host cell in which the target gene has 

been altered or inactivated compared to the host cell which expresses 
the normal target gene; and 

d) screening one or more compounds for their effects on expression of the 
target-dependent reporter gene. 

15 If expression of the target-dependent reporter gene increases in the host 

cell harboring an altered or inactivated target gene, then a potential inhibitor of the . 
regulon target gene will increase expression of the target-dependent reporter gene, and 
if expression of the target-dependent reporter gene decreases in the host cell harboring 
an altered or inactivated target gene, then a potential inhibitor of the regulon target 

20 gene will decrease expression of the target-dependent reporter gene. 

The method may fiirther comprise the step, performed before step d), of 
assessing the specificity of a potential target-dependent reporter gene by comparing 
gene expression profiles the potential target-dependent reporter gene to a plurality of 
genes in a database of compiled gene expression profiles to generate individual 

25 expression correlation coeflBcients wherein a target-dependent reporter gene whose 

expression correlates with the expression of the regulon target gene and with a minimal 
number or no other gene is selected over one whose expression correlates with a 
greater number of genes based on expression correlation coeflBcients. The method 
may also encompass upstream sequences that control expression of the target- 

30 dependent reporter genes fiised to a heterologous coding sequence, and the fiision is 
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used to screen compounds for potential inhibitors of the regulon target gene, as 
discussed above. 

In a preferred embodiment, the target gene is selected from YMRJ34w, 
YER034W, YJLlOSw, YKL077w, YGR046w, YJR04Jc, YER044c and YLRlOOw and 
S their mammalian homologs. 

Pharmaceutical Applications 

Compounds that bind to target proteins or regulate target gene 
expression can be tested in yeast cell systems and heterologous host cell systems (e.g., 
human cells) to verify that they do not have undesirable side effects. In addition,, the 

1 0 yeast GRM can be used to make sure that the compounds do not adversely alter gene 
transcription (e.g., in an undesirable way). Of course, certain changes in gene 
expression may be inevitable and many of these will not be deleterious to the patient or 
host organism. Once lead compounds have been identified, these compounds can be 
refined further via rational drug design and other standard pharmaceutical techniques. 

1 S The compounds of this invention may be formulated into 

pharmaceutical compositions and administered in vivo at an effective dose to treat a 
particular disease or condition. Determination of a preferred pharmaceutical 
formulation and a therapeutically efficient dose regiment for a given application is 
within the skill of the art taking into consideration, for example, the condition and 

20 weight of the patient, the extent of desired treatment and the tolerance of the patient 
for the treatment. 

Administration of the compounds of this invention, including isolated 
and purified forms, their salts or pharmaceutically acceptable derivatives thereof, may 
be accomplished using any conventionally accepted mode of administration. 

25 The pharmaceutical compositions of this invention may be in a variety 

of forms, which may be selected according to the preferred modes of administration. 
These include, for example, sohd, semi-solid and liquid dosage forms such as tablets, 
pills, powders, liquid solutions or suspensions, suppositories, and injectable and 
infusible solutions. The preferred form depends on the intended mode of 
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administration and therapeutic application. Modes of administration may include oral, 
parenteral, subcutaneous, intravenous, intralesional or topical administration. 

The compounds of this invention may, for example, be placed into 
sterile, isotonic formulations with or without cofactors which stimulate uptake or 

5 stability. The formulation is preferably liquid, or may be lyophilized powder. For 

example, the inhibitors may be diluted with a formulation buffer comprising 5.0 mg/ml 
citric acid monohydrate, 2.7 mg/ml trisodium citrate, 41 mg/ml mannitol, 1 mg/ml 
glycine and 1 mg/ml polysorbate 20. This solution can be lyophilized, stored under 
refrigeration and reconstituted prior to administration with sterile Water-For-Injection 

10 (USP). 

Topical administration includes administration to the skin or mucosa, 
including surfaces of the lung and eye. Compositions for topical administration, 
including those for inhalation, may be prepared as a dry powder which may be 
pressurized or non-pressurized. In nori-pressurized powder compositions, the active 

15 ingredient in finely divided form may be used in admixture with a larger-sized 
pharmaceutically acceptable inert carrier comprising particles having a size, for 
example, of up to 100 micrometers in diameter. Alternatively, the composition may be 
pressurized and contain a compressed gas, such as nitrogen or a liquified gas 
propellant. The liquified propellant mediimi and indeed the total composition is 

20 preferably such that the active ingredient does not dissolve therein to any substantial 
extent. 

Dosage forms for topical or transdermal administration of a compound 
of this invention include ointments, pastes, creams, lotions, gels, powders, sohitions, 
sprays, inhalants or patches. The active component is admbced under sterile conditions 
25 with a pharmaceutically acceptable carrier and any needed preservatives or buffers as 
may be required. Ophthalmic formulation, ear drops, eye ointments, powders and 
solutions are also contemplated as being within the scope of this invention. 

The pharmaceutical compositions of this invention may also be 
administered using microspheres, microparticulate delivery systems or other sustained 
30 release formulations placed in, near, or otherwise in communication with affected 
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tissues or the bloodstream. Suitable examples of sustained release carriers include 
semipermeable polymer matrices in the form of shaped articles such as suppositories or 
microcapsules. Implantable or microcapsular sustained release matrices include 
polylactides (U.S. Patent No. 3,773,319; EP 58,481), copolymers of L-glutamic acid 
5 and gamma ethyl-L-glutamate (Sidman et al., 1985); poly(2-hydroxyethyl- 
methacrylate) or ethylene vinyl acetate (Langer et al., 1981, Langer, 1982). 

The compounds of this invention may also be attached to liposomes, 
which may optionally contain other agents to aid in targeting or administration of the 
compositions to the desired treatment site. Attachment of the compounds to 

10 liposomes may be accomplished by any known cross-linking agent such as 

heterobifunctional cross-linking agents that have been widely used to couple toxins or 
chemotherapeutic agents to antibodies for targeted delivery. Conjugation to Uposomes 
can also be accomplished using the carbohydrate-directed cross-linking reagent 4-(4- 
maleimidophenyl) butyric acid hydrazide (MPBH) (Duzgunes et al., 1992), herein 

1 5 incorporated by reference. 

Liposomes containing pharmaceutical compounds may be prepared by 
well-known methods (See, e.g. DE 3,218,121, Epstein et al., 1985; Hwang et al.,1980; 
U.S. Patent Nos. 4,485,045 and 4,544,545). Ordinarily the liposomes are of the small 
(about 200-800 Angstroms) unilamellar type in which the lipid content is greater than 

20 about 30 mol.% cholesterol. The proportion of cholesterol is selected to control the 
optimal rate of MAG derivative and inhibitor release. 

The compositions also will preferably include conventional 
pharmaceutically acceptable carriers well known in the art (see, e.g., Remington's 
Pharmaceutical Sciences, 16th Edition, 1980, Mac Publishing Company). Such 

25 pharmaceutically acceptable carriers may include other medicinal agents, carriers, 

genetic carriers, adjuvants, excipients, etc., such as human serum albumin or plasma 
preparations. The compositions are preferably in the form of a unit dose and will 
usually be administered one or more times a day. 
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EXAMPLE 1: PREPARATION OF THE Genome Reporter Matrix^M 

Comtmction of Reporter Gene Fusions (Method 1) 

The regulatory region of each yeast gene was cloned into one of two 
vectors, pAB 1 or pAB2. The vector pABl was constructed in the following manner: 
5 First, the polymerase chain reaction (PCR) was used to amplify the transcriptional 

terminator region from the gene PGKl using the oligonucleotides 5P-PGKTERM (5 - 
GATTGAATTCAATTGAAATCGATAG-3') and 3P-PGKTERM (5'- 
CCGAGGCGCCGAATTTTCGAGTTAT-3'). The amplified fragment consists of the 
263 base-pair region immediately downstream of the PGKl stop codon, and contains 

10 an EcoRI site at the 5' end and a Narl site at the 3' end. These restriction sites were 
engineered into the two PCR primers (underlined sequences). The terminator was then 
cloned into YIplac21 1 that had been Unearized with EcoRI and Narl, yielding pAB34. 
Next, the coding region of the green fluorescent protein (GFP) from Aequoria victoria 
was amplified by PCR using the oligonucleotides 5P-GFP-ORF (5- 

15. C ATGTCTAGAGGAGAAGAACTTTTC.3') and 3P-GFP-0RF (5'- 

CGCGAATTCCTATTTGTATAGTTCA-3'), Again, these oligonucleotides contain 
engineered Xbal and EcoRI sites at the 5' and 3* ends, respectively (underlined). This 
fragment was cloned into pAB34, linearized with Xbal and EcoRI, to produce pAB35. 
Finally, the GFP-PGK terminator fragment was moved into the episomal vector 

20 YEplac 1 95 (9) as an Xbal/Narl fragment, thereby producing pAB 1 . 

The vector pAB2 is pABl with an altered multiple cloning site (MCS). 
The new MCS contains 8 basepair recognition sites for three restriction enzymes. 
These larger 8 base-pair recognition sites occur less frequently throughout the yeast 
genome than the 6 base-pair sites present in the MCS of pAB 1 . Thus, the utilization of 

25 restriction enzymes that recognize 8 base-pair sequences to clone the various 

regulatory regions (engineered into the PCR primers used to amplify the regions) 
would minimize the occurrence of those sites within the regions themselves. To 
construct pAB2, pABI was linearized with Xbal and SphI, dropping out the existing 
MCS, and an adapter containing the new MCS was ligated in. The adapter was made 

30 by hybridizing two oligonucleotides, 8Cutter (5*- 
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CGGCGCGCCGCGGCCGCATGGCCGGCCAAT-3') and SCutEnd (5'- 
CTAGATTGGCCGGCCATGCGGCCGCGGCGCGCCGCATG-3'), This adapter has 
sites for the restriction enzymes Fsel, NotI, and AscI (underlined). 

The promoter regions were cloned utilizing PCR of genomic DNA 

5 prepared from a strain derived from S288c; JRY147 (MATa SUC2 mal mel gal2 
CUP 1 ). The promoter-specific primers were designed such that the proximal primer 
spanned the start codon of the specific gene and included a few (usually four) codons 
derived from the gene. The position of the distal primer was determined on a case-by- 
case basis depending on the distance to, and orientation of, the neighboring open 

10 reading frame (ORF) and the restriction sites present. Where the upstream ORF was 
positioned in a divergent orientation and within 1,200 base-pairs, the size of the 
promoter fragment amplified was adjusted such that all nucleotides up to, but not 
including, the start codon of the upstream ORF were present. In cases where the 
upstream ORF was situated in the same orientation, the amplified fragment was 

1 5 designed to extend into the coding region but not so as to include the start codon. 

Both primers had restriction enzyme recognition sites engineered into the ends to allow 
the subsequent cloning of the PCR fragment into pABl, or pAB2. 

Construction of Reporter Gene Fusions (Method 2) 

In another method for constructing genome reporter constructs, a 

20 vector comprising a marker gene having an amber mutation and a supF tRNA gene 
which suppresses the amber mutation is used as the parent vector. 

A plasmid cloning vector was constructed which comprises a mutant P- 
lactamase gene with an amber mutation and a supF tRNA gene. Downstream of the 
supF tRNA gene there is a "stuffer" DNA fragment which is flanked by BsmBI 

25 restriction sites. The BsmBI restriction enzyme cuts outside of its six base pair 
recognition sequence (see, e.g.. New England Biolabs 96/97 Catalog, p. 23) and 
creates a four nucleotide 5* overhang. When the plasmid cloning vector is digested 
with BsmBI, the enzyme cleaved within the stuffer DNA and within the adjoining 
tRNA gene and deleted the four 3' terminal nucleotides of the gene. The deleted supF 

67 



wo 00/58521 



PCT/USOO/08604 



tRNA gene encodes a tRNA which cannot fold correctly and is non-functional, i.e., it 
could not suppress the amber mutation in the mutant ^-lactamase gene (P-lactamase 
(amber)). Downstream from the stufifer DNA fragment is the coding region of a 
modified green fluorescent protein ("GFP") gene. 
5 The stuffer DNA was excised from the vector by digestion with BsmBI. 

The double-stranded DNA at the 5Z(pF-stufFer fragment junction, produced by BsmBI 

digestion, is shown below. The tRNA gene sequences are indicated in bold: 

5 ' . . supF. . TC CCCCGGAGACGTC . . staffer . . 

. . AGGGGG C CTCTGCA G . . 5 ' 

10 BsmBI 

The 3' terminal sequence of the supF gene necessary for proper 
function is TCCCCCACCA. The vector, once cleaved with BsmBI, lacks the supp' 
tRNA ACCA terminal nucleotides if the overhangs self-anneals during re- 
circularization of the plasmid in the absence of insert. 

IS A DNA insert containing the upstream regulatory sequence from a 

yeast ORF was generated as a PCR fragment. Two oligonucleotides were designed to 
flank the DNA insert sequences of interest on a template DNA and anneal to opposite 
strands of the template DNA. These oligonucleotides also contained a sequence at 
their respective 5' ends that, when converted into a 5' overhang (in the double-stranded 

20 PCR fragment generated using the oligonucleotides), is complementary to the 
overhangs on the cloning vector generated by BsmBI endonucleolytic cleavage. 

Oligonucleotide #1 comprises the 5' terminal sequence: 5' CCCCACCA 
.... The remaining nucleotides 3' to this sequence were designed to anneal to 
sequences at one end of the DNA insert of choice, in this Example, to one of a 

25 multitude of yeast expression control sequences. 

As highlighted in bold above, oligonucleotide #1 comprises the base 
pairs needed to restore the wild-type 3' terminal end of the supF tRNA gene. These 
base pairs are located immediately 3' to the sequence that allows the insert to anneal to 
the overhang in the BsmBI-digested pAB4 vector. 

30 Oligonucleotide #2 comprises the 5' terminal sequence: 5' TCCTG .... 

The remaining nucleotides 3' to this sequence were designed to anneal to sequences at 
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the Other end of the DNA insert of choice, m this Example, to one of a variety of yeast 
expression control sequences which may be used according to this invention. 

The DNA template (S. cerevisiae genomic DNA) and the two 
oligonucleotides were annealed and the hybrids were amplified by polymerase chain 
reaction using Klentaq™ polymerase and PCR buffer according to the manufacturer's 
instructions (Clontech). Briefly, 15 ng iS. cerevisiae genomic DNA served as template 
DNA in a lOul PCR reaction containing 0.2mM dNTPs, PCR buffer, Klentaq™ 
polymerase, and 1 jiL of an 8jiM solution containing the primer pairs. The PCR 
reaction nrixture was subjected to the following steps: a) 94oC for 3 min; b) 94oC for 
15 sec; c) 52oC for 30 sec; d) 72oC for 1 min, 45 sec; and e) 4oC indefinitely. Steps 
b) through d) were repeated for a total of 30 cycles. The PCR amplification product 
was purified away fi*om other components of the reaction by standard methods. 

To generate the desired 5* overhangs on the ends of the PCR 
amplification product, the PCR fragment was treated with DNA polymerase I in the 
presence of dTTP and dCTP. Under these conditions, DNA polymerase I fills in 3* 
overhangs with its 5' to 3' polymerase activity and also generates 5' overhangs with its 
3' to 5' exonucleolytic activity, which, in the presence of excess dTTP and dCTP, 
removes nucleotides in a 3' to 5* direction until a thymidine or a cytosine, respectively, 
is removed and then replaced. 

The overhangs generated by this reaction are: 

a) At the 5* end (supF tRNA restoring end) of the DNA insert: 

5' CCCCACCA. . becomes 5' CCCCACCA. . 

GGGGUGGI . . TGGT , . 

b) At the 3* end of the DNA insert (joined to the GFP coding sequence): 

4 

5' CAGGA. - becomes 5' C 

GTCCT . . GTCCT . . 

This DNA insert, now comprising 5' overhangs compatible with one of 
each of the ends of the BsmBI-cleaved pAB4 vector, was used as substrate in a 
standard ligation reaction with the BsmBI-cleaved pAB4 vector. The resulting ligation 
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mixture was used to transform competent E. colt ceils. The cells were plated on agar 
plates in the presence of ampicillin. 

Colonies that grew in the presence of ampicillin were producing 
functional p-Iactamase enzyme and each harbored the desired recombinant DNA 
molecule, having a DNA insert with a yeast expression control sequence inserted 
upstream of the modified GFP coding region. The supF gene on vectors which re- 
ligated without a DNA insert did not express a functional supF tRN A and did not 
make functional P-lactamase. Thus, they were not found in transformed host cells 
grown on ampiciUin. 

Constmction of Yeast Strains 

Strain ABYl 1 (MATa leu2Al ura3-52) of 5. cerevisiae was used. 
ABYl 1 is derived from S288c. GRM arrays were grown at 30°C on solid casamino 
acid medium (Difco) with 2% glucose and 0.5% UltraPure Agarose (Gibco BRL). 
The medium was supplemented with additional amino acids and adenine (Sigma) at the 
following concentrations: adenine and tryptophan at 30 [ig/ml; histidine, methionine, 
and tyrosine at 20 ng/ml; leucine and lysine at 40 ng/ml. Stock solutions of the 
supplements were made at lOOx concentrations in water. Yeast cells were transformed 
with the reporter plasmids prepared by Method 1 or Method 2 (above) by the lithium 
acetate method (Ito et al., 1983, and Schiestl and Gietz, 1989). 

Determinations of Reporter Gene Expression Levels 

Solutions of test compounds were added directly to the yeast strains or 
were coated on plates prior to addition of the yeast strains. The individual strains 
comprising the GRM were maintained as independent colonies (and cultures) in a 96- 
well format, in medium selecting for the URA3-containing reporter plasmid. Prior to 
each experiment, fresh dilutions of the reporter-containing strains were inoculated and 
grown overnight at 30°C. A Hamilton MicroLab 4200, a multichannel gantry robot 
equipped with a custom pin tool device capable of dispensing SO nanoliter volumes in a 
highly reproducible manner, was used to array the matrix of yeast strains in a uniform 
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manner onto solid agar growth media at a density of 1536 reporter strains per 1 10 cm 
plate. Fifty nanoliters of yeast liquid cultures arrayed onto solid medium by the 
Hamilton MicroLab 4200 results in colony-to-colony signal reproducibility of less than 
5% variation. Once arrayed, each plate was grown at SO^'C for 18 -hours or at 25°C 
5 for 24 hours. 

The level of fluorescence expressed from each reporter gene fusion was 
determined using a Molecular Dynamics Fluorimager SL AIS image analysis software 
(Imaging Research, Ontario CA) was used to quantitate the fluorescence of each 
colony in the images. Generally, the drug treatments were performed at several 
10 concentrations, with the analysis based upon the concentration producing the most 
informative expression profile. 

EXAMPLE 2: IDENTIFICATION OYHESl AS A REGULON INDICATOR 

GENE 

The effects of Simvastatin on the Genome Reporter Matrix'^^ were 
15 tested at a concentration of 20 jig/ml. The HESJ reporter gene construct was induced 
by a natural log ratio of 4.2 (treated/untreated), indicating that the HESJ reporter had 
an excellent signal to noise ratio induction in response to Simvastatin. The HESJ gene 
encodes a protein with a significant amount of similarity with oxysterol binding 
proteins and has been implicated in isoprenoid metabolism (Figure 35). Analysis of 
20 gene expression data with the Genome Reporter Matrix™ revealed that HESJ 
expression is highly correlated with expression of genes encoding enzymes of the 
isoprenoid biosynthetic pathway (Figure 36). 

The specificity of the HESJ reporter for inhibitors of ergosterol 
biosynthesis was tested in silico. The expression of the HESJ reporter was examined 
25 in data from 710 experimental treatments of the Genome Reporter Matrix™. Basal 
levels of HESJ reporter gene expression were 0. 1 units. Units are defined as an 
arbitrary fluorescent value that has been normalized such that a value of 1.0 equals the 
mean reporter fluorescent level of all members of the Genome Reporter Matrix™ in a 
given experiment. All treatments (a total of 5 1) that induced HESJ reporter gene 
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levels to 0.5 units or greater were treatments known to inhibit ergosterol biosynthesis, 
indicating a high degree of specificity for this pathway (Figure 37). 

The utility of the HESl reporter gene in a high-throughput screen was 
tested by incubating a yeast strain harboring the HE^l reporter in a 384-well array 
S containing various concentraions of ergosterol biosynthesis inhibitors (Econazole and 
Simvastatin) and nonspecific drugs (Flucytosine and Nifedipine). Cells were grown to 
mid-log phase at 30T in casamino acids medium (0.67% yeast nitrogen base, 2% 
glucose, 2% casamino acids). Cell density was adjusted prior to incubation in various 
concentrations of drug. Arrays were incubated at 30°C for 24 hrs prior to imaging. 
10 The HESl reporter was found to be specifically induced by Econazole and Simvastatm 
but not by Flucytosine or Nifedipine. 

To fiirther test the viability of this indicator gene in a high-throughput 
screen, the regulation of the HESl reporter was tested in two different strain 

« 

backgrounds. ABYll(Mi47a leu2Al iira3-52^ is a wild-type strain. ABY140 
15 (Ai47'a his3AI leu2A0 metlSAO pdr5::KanMX uraSAO yorl::KanMX) is a strain 

containing mutations in two multidrug resistance genes. Induction of the HESl 
reporter gene in ABY140 was found to be more sensitive to Simvastain and Econazole 
but not to Flucytosine or Nifedipine when compared to ABYl 1 . • 

The ABY140 [HESl] strain was used to screen approximately 16,800 
20 chemicals from a combinatorial chemistry library. One percent of these cheniicals 

induced the HESl indicator gene. Twenty-four of these chemical were further tested 
in a secondary screen for the ability to induce four additional indicator (also referred to 
as reporter) genes whose expression are also coordinately regulated with genes 
encoding ergosterol biosynthetic enzymes. Eight of these twenty-four chemicals also 
25 induced these reporter genes, suggesting that these chemicals interfere with ergosterol 
biosynthesis. 

This example reveals how a high quality promoter sequence identified 
from systematic genome expression data can be employed with a significant degree of 
confidence to identify chemicals with a desired biological activity. 

72 



wo 00/58521 PCTAJS00/08(>04 

« 

The DNA and amino acid sequence of HESl is shown in Figures 62 
and 63, respectively. 

EXAMPLE 3: mENTIFICATION OF YJLlOSw AS A TARGET GENE 

YJLlOSw was a previously uncharacterized ORE which contains a PHD 
5 finger suggesting that it functions as a transcription factor (Figure 1). Gene 

expression correlation coefficients were calculated for 1532 reporter constructs 
including known genes involved in sterol biosynthesis. Several uncharacterized genes, 
including YJLlOSw^ were found to have highly correlated gene expression with genes 
encoding sterol biosynthetic enzymes. YJLlOSw expression correlated very well (0.83) 

10 with expression of CYB5, a gene involved in ergosterol biosynthesis (Figure 2). 

CybSp is thought to be an electron donor for sterol modifying enzymes (Mitchell A.G., 
Martin C.E., J. BioL Chem., 1995, 270(50):29766-72). Expression of YJLIOSw was 
induced considerably by drugs that inhibit sterol biosynthesis as well as by a mutation 
in the gene encoding HMG-CoA Synthase (Figure 3). The YJLIOSw reporter 

15 construct comprises 1200 base-pairs of DNA sequence 5' to the ATG start codon and 
thus, contains sequence information sufficient to confer the observed regulated 
expression. 

To test whether YJLIOSw has a role in isoprenoid metabolism, a 
yjllOSw mutant where the entire ORF was replaced with the kanamycin resistance gene 

20 was constructed. Approximately 5x10^ cells of the yjl J OSw mutant strain and a wild- 
type control strain (ABY363, MATa his3AJ leu2A0 lys2A0 uraSAO) were plated 
onto separate non-selective agar plates. The sterol biosynthetic inhibitor lovastatin 
(250ng) was applied to a sterile disk on each lawn and the cells were allowed to grow 
overnight at 30''C. The yjl I OSw mutant strain was found to be significantly more 

25 resistant to lovastatin treatment, fiirther implicating this ORF in lipid metabolism 
(Figure 4). 

YJLIOSw appears to be fungal-specific since no apparent manmialian 
counterparts were found. Although YJLIOSw is not an essential gene, it could provide 
utility for constructing strains for specific applications. For instance, the resistance to 



73 



wo 00/58521 



PCT/USOO/08604 



lovastatin conferred by ^yjlJOSw mutant could result from an elevated flux through the 
isoprenoid biosynthetic pathway. Such a condition may result from an altered 
composition of the cell's lipid bilayer that triggers the induction of synthesis of 
isoprenoid biosynthetic enzymes and/or reduces the cell's permeability to lovastatin. In 
S ' either of these cases, a strain defective for YJLlOSw could be useful for constructing 
strains that could grow under extreme situations, such as in industrial applications. 
Examples of extreme conditions include growth at high or low temperatures (>35**C or 
<20"C) or in osmotically stressfiil conditions or in the presence of amphipathic solutes. 
Alternatively, the resistance to lovastatin in the yjllOSw mutant could result from 
10 decreased expression of membrane transporters or channels that allow entry of foreign 
compounds (xenobiotics). In this case, overexpression of YJLJOSw could produce a 
highly permeablized strain that would have niraierous applications where entry of 
compounds into a cell is limited by permeability or availabiUty of compounds. A 
mammalian counterpart of this ORF, if found, could be useful as a diagnostic marker 
15 for people with high serum cholesterol levels. Individuals that have mutations, null or 
weak (hypomorphic) alleles, might be expected to have a higher rate of sterol 
synthesis. 

The DNA and protein sequences of YJLJOSw are depicted in Figures 39 
and 40, respectively. 

EXAMPLE 4: IDENTIFICATION OF YMR134w AS A TARGET GENE 

YMRJ34W is an ORF that had been suggested previously to be involved 
in iron metabolism (Figure 5). Among 1532 reporter constructs, YMRJ34w 
expression was found to be highly correlated with the expression of ERG2 (Figure 6) 
and is therefore likely to be involved in lipid metabolism. The YMR134w reporter 
construct was found to be highly induced by various statins (inhibitors of HMG-CoA 
reductase) and azple compounds (inhibitors of lanosterol 14-alpha demethylase, 
ERG 11) (Figure 7). The YMR134w reporter construct comprises 1200 base-pairs of 
DNA sequence 5* to the ATG start codon and thus, contains sequence information 
sufficient to confer the observed regulated expression. A database search for- 
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YMRJ 34w'Vtlaied protein sequences revealed a weak similarity to human vascular 
endothelial growth factor receptor (Figure 8). 

The DNA and protein sequences of YMR134w are depicted in Figures 
41 and 42, respectively. 



5 EXAMPLE 5: IDENTIFICATION OF YER044c AS A TARGET GENE 

YER044C was a previously uncharacterized yeast ORF with one 
predicted transmembrane domain (Figure 9). YER044c expression is significantly 
correlated with the expression of ERG2 (0,82, Figure 10). Statins, azoles and a 
deletion mutant of the ERG J I gene each induce expression of the YER044c reporter 

10 construct most significantly in 498 treatments of the GRM (Figure 11). The YER044c 
reporter construct comprises 1200 base-pairs of DNA sequence 5* to the ATG start 
codon and thus contains sequence information sufficient to confer the observed 
regulated expression. DNA and proteins sequence database comparisons with the 
predicted protein sequence of YER044c revealed an apparent Schizosaccharomyces 

15 pombe counterpart and numerous mammalian EST apparent counterparts (Figures 12- 
14). 

The DNA and protein sequences of YER044c are depicted in Figures 
43 and 44 respectively. The apparent mouse, human and rat EST counterparts of 
YER044C are depicted in Figures 45-47, respectively. 

20 

EXAMPLE 6: fflENTIFICATION OF YLRlOOw AS A TARGET GENE 

YLRlOOw was a previously uncharacterized yeast ORF (Figure 15). 
. Expression of YLRlOOw correlated significantly (0.82) with CYB5 in the GRM 
composed of 6036 reporter constructs in 706 experimental treatments. The correlation 
25 of expression of YLRlOOw to the expression of CYB5 implied a role of YLRlOOw in 
hpid metabolism. Expression of the YLRlOOw reporter was induced significantly by 
statins, azoles and in a yeast ergU mutant consistent with a role of YLRlOOw in lipid 
metabolism (Figure 17). Searches of DNA and protein sequence databases for similar 
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sequences revealed a GenBank entry for a 17-beta-hydroxysteroid dehydrogenase 
mouse cDNA (Figure 18). 

The sequence of the mouse cDNA is shown in Figure 53. Given the 
protein sequence similarity (Figure 19) and the fact that yeast is not known to 
5 synthesize steroid hormones, it is conceivable that the mouse cDNA encodes a protein 
with another role in lipid metabolism. In this case, the mammalian protein could have 
utility as a pharmacological target to modulate lipid metabolism. Another GenBank 
entry was found for a rat ovarian specific protein with significant similarity to 
YLRIOOw. The sequence of the rat protein is shown in Figure 65. Two mouse ESTs 
10 were found to be significantly similar to YLRIOOw. The sequence of the two mouse 
ESTs are shown in Figures 51 and 52. A human EST was found that was similar to 
YLRIOOw, but to a lesser ejctent than the two mouse ESTs. 

The DN A and protein sequences of YLRIOOw are depicted in Figures 
48 and 49, respectively. The sequence of the human EST is shown in Figure 50. 



15 EXAMPLE 7: IDENTIFICATION OF YER034w AS A TARGET GENE 

YER034W is a yeast ORF that had been shown previously not to be 
essential for cell viability (Figure 20). Expression of the YER034w reporter construct 
was found to be correlated (0.75) with the expression of a GPA2 reporter construct in 
a GRM composed of 1532 reporters treated under 498 experimental conditions 

20 (Figure 21). GPA2 encodes the alpha subunit of a trimeric G protein involved in 

pseudohyphal differentiation (Lorentz, M.C. and Heitman, J. EMBOl 1997 16:7008- 
701 8). This correlation suggested that YER034w had a role in the pseudohyphal 
growth and could represent a new antifimgal target. 

To test this hypothesis, a diploid homozygous j^erOi^w knockout strain 

25 was purchased fi-om Research Genetics (Huntsville, AL). Wild-type cells (ABY13, 
MATa/MATalpha his3Al/his3AI leu2M/leu2A0 metl5A0/MET15 LYS2/fys2A0 
ura3A0/ura3A0) and the homozygous j^erOJ-Zw knockout strain were plated onto low 
nitrogen plates to stimulate pseudohyphal differentiation. After four days at 25°C, 
plates were examined under a microscope. Theyer034w knockout strain had 
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undergone significantly more differentiation than the wild-type control both in terms of 
numbers of projections per colony (Figure 22) and the size of the hyphae. This result 
implicated YER034w in the dimorphic transition of cells from yeast to pseudohyphae. 
The ability of fixngi to undergo this morphological transition has been suggested to be a 
S critical aspect of fungal pathogenicity. A search for related mammalian protein 

sequences did not identify any obvious counterparts suggesting that this protein is 
fungal-specific and may be an amenable anti-fungal target. 

The DNA and protein sequences of YER034w are depicted in Figures 
54 and 55, respectively. 

10 EXAMPLE 8: IDENTIFICATION OF YKL0774w AS A TARGET GENE 

YKL077W was a previously uncharacterized ORF with one predicted 
transmembrane domain (Figure 23). Expression of the YKL077w reporter construct 
was found to be correlated (0.92) with the expression of a SGVl reporter construct in 
a GRM composed of 1 532 reporters treated under 498 experimental conditions 

15 (Figure 24). Sgvlp is a Cdc28p-related protein kinase that is essential for cell 

viability. In addition to Sgvlp expression, YKL077w expression correlated highly 
(>0.8) with PKCl and RHOl (Figure 25), genes involved in cell wall integrity and 
cytoskeletal reorganization. Database searches with the predicted protein sequence of 
YKL077W did not identify apparent mammalian counterparts (Figure 26). YKL077w 

20 could represent an antifungal target given the lack of a mammalian homolog and its 
proposed involvement in cellular structure and/or proliferation. Nevertheless, in the 
event a mammalian counterpart is discovered, it could represent an anti-proliferative 
target as well. 

The DNA and protein sequences of YKL077w are depicted in Figures 
25 56 and 57, respectively. 

EXAMPLE 9: IDENTIFICATION OF YGR046w AS A TARGET GENE 

YGR046W was a previously uncharacterized yeast ORF that has been 
shown to be essential for viability (Figure 27). Expression of YGR046w correlated 
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significantly (0.90) with IRA2 in the GRM composed of 6036 reporter constructs in 
706 experimental treatments (Figure 28). Ira2p is a GTPase activating protein (GAP) 
for Raslp and Ras2p. In addition to IRA2 expression, YGR046w expression correlated 
very well (>0 .77) with the expression of known genes involved cell proliferation 
5 functions (Figure 29). The expression of YGR046w was found to be most sensitive to 
agents that disrupt mitochondrial function, create oxidative stress and disrupt the 
cytoskeleton (Figure 30). 

Given its proposed involvement in cell proliferation, YGR046w could 
represent a target for modulation of cell growth. A search of protein and DNA 
10 sequence databases did not reveal any apparent manunalian homologs. Nevertheless, if 
such a sequence is identified, it may represent an anti-proliferative mammalian target. 

The DNA and protein sequences of YGR046w are depicted in Figures 
58 and 59, respectively. 

EXAMPLE 10: roENTIPTCATION OF YJR041c AS A TARGET GENE 
15 Mutant strains defective for YJR041c have been shown previously to 

display a severe growth defect, but no function for YJR041c was known (Figure 31). 
Expression of YJR041c correlated significantly (0.83) withAffiD/ in the GRM 
composed of 6036 reporter constructs in 706 experimental treatments (Figure 32). 
Med7p encodes a component of the mediator complex involved in RNA polymerase II 
20 transcription. YJR041c expression was also found to correlate significantly (>0. 7 1 ) 
with several genes involved in different aspects of RNA metabolism. These processes 
include RNA polymerase I and II transcription, mRNA splicing, RNA turnover and 
ribosome function (Figure 33). 

Database searches for related sequence identified similar sequences 
25 from Schizosaccharomyces pombe (Figure 34). No obvious mammalian counterparts 
were identified suggesting that YJR041c is a fungal-specific protein. Given these 
factors, YJR041C could represent an attractive target for antifungal therapy. In the 
event a mammalian counterpart is identified, it also could represent a target with utility 
for modulating cell proliferation. 
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The DNA and protein sequences of YJR041c are shown in Figures 60 
and 61, respectively. 

EXAMPLE 11: SCREENING ASSAY USING THE GENOME REPORTER 
MATRIXTf^' TO IDENTIFY TARGET INHIBITORS 

S A mutant or conditional allele of target yeast gene is produced as 

discussed above. The allele may be conditional either for function or expression. For 
instance, the conditional allele may be a temperature-sensitive allele of the target gene 
or the target gene may be operably linked to an inducible promoter for regulated 
expression. In a preferred embodiment, the target gene is operably linked to an 

10 inducible promoter that permits expression anywhere between 0% and 500% of wild 
type expression. The target gene of interest is transfected and expressed in yeast cells 
of the GRM that have a functional deletion of the target gene of interest. The level of 
expression of the conditional allele is varied between 0% and 500% of wild type 
expression, and the expression of the reporter constructs of the GRM is measured in 

15 response to the expression of the target gene. The expression of the reporter 

constructs is then correlated to the expression of the target gene. Thus, one can 
identify a subset of genes that are either induced or repressed by overexpression of the 
target gene. 

The yeast strains containing the subset of genes whose expression is 
20 dependent upon overexpression, and thus the function of the essential gene, are then 
. used to screen compounds that are potential target inhibitors. The yeast strains are 
incubated with the compounds. If a reporter gene in a particular yeast strain is induced 
by overexpression of the target gene, then potential inhibitors are screened for the 
ability to downregulate the reporter gene. Conversely, if a reporter gene is repressed 
25 by overexpression of the target gene, then potential inhibitors are screened for the 

ability to upregulate the reporter gene. Potential inhibitors are screened for the ability 
to appropriately upregulate and downregulate a number of the genes whose expression 
is dependent upon expression or overexpression of the target gene. When potential 
target inhibitors are identified, these candidate compounds are tested for their ability to 
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inhibit the pathway that the target gene is part of. For instance, if the target gene is 
YER034W, then the inhibitor may be tested for antifungal activity. 

If a target gene has a plant or animal counterpart, one may express the 
plant or animal counterpart in a yeast strain lacking the target gene to see if the plant 
or animal counterpart can functionally substitute for the yeast gene. If it can, then the 
plant or animal counterpart can be used in the above example to screen for potential 
targets for either a plant or animal inhibitor. This is especially useful if the target gene 
has a mammalian' counterpart. Similarly, even if a plant, animal or mammalian 
counterpart has not been identified, potential inhibitors may be tested for their ability to 
inhibit the pathway that the target gene is part of, if that pathway is shared by yeast and 
higher eukaryotes. 

EXAMPLE 12: SIMULTANEOUS TRACKING OF MULTIPLE REPORTERS 
AS REGULON INDICATOR GENES 

The effects of inactivating an osmotic stress pathway were tested by 
deleting a pathway component (Hoglp stress-activated protein kinase). Using the 
hogi knock-out profile as model, multiple RIGs that would specifically indicate 
pathway inhibitors were identified and tested in silico by examining all conditions in 
which selected RIGs were activated or repressed. It was determined that 
simultaneously monitoring up-regulation of PGUl and down-regulation of DAKl gave 
good specificity for pathway inactivation as determined by the separation of the hogl 
knock-out profile from all other conditions in which these two reporters were affected 
(Figure 74). In this example, RIGs were not part of the target regulon but were 
chosen empirically based on behavior under all conditions. 

Similarly, 2 RIGs were identified that could specifically indicate 
mitochondrial inactivation by comparing the behavior these RIGs in the subset of 
treatments that target mitochondria with ail treatments that affect these RIGs. It was 
determined that simultaneously measuring up-regulation of 2 RIGs (STEJ8 and 
YGL19Hw) provides good .specificity for mitochondrial perturbations as determined by 
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the separation of this subset of common treatments from all other conditions that affect 
these RlGs (Figure 75). 



All publications and patent applications cited in this specification are 
herein incorporated by reference as if each individual publication or patent application 

S were specifically and individually indicated to be incorporated by reference. Although 
the foregoing invention has been described in some detail by way of illustration and 
example for purposes of clarity of understanding, it will be readily apparent to those of 
ordinary skill in the art in light of the teachings of this invention that certain 
changes and modifications may be made thereto without departing from the spirit or 

10 scope of the appended claims. 
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CLAIMS 



We claim: 



1 . A method for placing Gene X, a gene of unknown function, into a 
functional genetic group comprising the steps of: 

5 a) generating a gene expression profile for Gene X; 

b) comparing the gene expression profile of Gene X with gene 

expression profiles of a plurality of other genes in a database of 
compiled gene expression profiles to generate expression 
correlation coefficients; 

10 c) identifying based on their expression correlation coefficients a . 

set of genes comprising Gene X that are coordinately expressed; 

d) determining if the one or more genes whose expression is most 
highly correlated with that of Gene X belong to a gene regulon 
involved in a known biological pathway, or a common set of 

1 5 biological reactions or functions; and 

e) optionally testing the effect on Gene X expression of at least 
one altered condition or treatment known to affect the function 
to which Gene X hs been ascribed; 

wherein Gene X is placed in the gene regulon of d) if Gene X expression is 
20 coordinate with expression of that regulon. 

2. A method for identifying a regulon indicator gene in a database of 
compiled gene expression profiles, wherein expression of the regulon indicator gene 
correlates with the expression of at least one known gene in a group of coordinately 
expressed genes or provide a measure of the function of a biological process of 

25 interest, the method comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 
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b) identifying based on their relative expression correlation 
coefficients a set of genes that are coordinately expressed; 

c) selecting a set of genes from b) which comprises one or more 
genes known to function in a particular biological pathway, or a 

S common set of biological reactions or functions; 

d) selecting a member of the set of c) having one or more of the 
following characteristics: 

1 ) its expression profile is sensitive to one or more stimuli; 

2) its expression profile exhibits a large dynamic range in 
1 0 response to one or more stimuli; 

3) its expression profile exhibits a rapid kinetic response to 
one or more stimuli; 

4) its expression profile is specific to a known biological 
pathway or a common set of biological reactions or 

15 functions; 

5) the regulon indicator gene does not contain sequences 
that are problematic for maintaining on plasmids when 
introduced into host cells. 



20 3. The method of claim 2, wherein the regulon indicator gene is co- 

regulated with one or more genes in the group of coordinately expressed genes of c). 

4. The method of claim 2, wherein the regulon indicator gene, upon 
expression, controls the expression of at least one other gene in the group of 
coordinately expressed genes of c). 

25 5. The method of claim 2, wherein the regulon indicator gene is of 

previously unknown function. 
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6. A method for selecting a novel regulon target gene from a database 
of compiled gene expression profiles, comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 
5 b) identifying based on their expression correlation coefficients a 

set of genes that are coordinately expressed; 

c) selecting from b) a set of genes comprising one or more genes 
of unknown fiinction and one or more genes known to fiinction 
in a particular biological pathway, or a common set of biological 

1 0 reactions or fimctions of interest; 

d) selecting from the set of c) at least one gene of unknown 
fimction, Gene X, as a novel regulon target gene; wherein Gene 
X is a gene whose expression profile closely correlates to the 
expression profiles of the one or more genes of the set of c) 

15 known to function in the particular biological pathway, or 

common set of biological reactions or fimctions of interest. 



7. The method of claim 6, fiirther comprising the step of generating 
individual correlation coefficients between the gene expression profile of Gene X and a 
plurality of genes in the database to assess the selectivity of Gene X as a novel regulon 

20 target gene. 

8. The method of claim 6, fiirther comprising the step of determining 
whether the protein encoded by Gene X exhibits substantial homology to a human, 
non-human mammal, avian, amphibian, fish, insect or plant protein. 



9. The method of claim 8, wherein said determining comprises the 
25 steps of hybridizing Gene X to genomic DNA from human, non-human mammal, 

avian, amphibian, fish, insect or plant cells or tissue under low stringency conditions. 
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10. The method of claim 8, wherein said determining comprises the 

steps of: 

a) comparing the DNA sequence of Gene X to the DNA sequences 
from other organisms or 

b) obtaining an amino acid sequence encoded by Gene X and comparing 
it to amino acid sequences from other organisms. 



1 1 . The method of any one of claims 8- 1 0, wherein the DNA or amino 
acid sequences from other organisms are contained within a database, and wherein the 
• DNA or amino acid sequence encoded by Gene X is compared to the DNA or amino 
10 acid sequences from other organisms using a computer algorithm. 



12. The method of claim 1 1, wherein the computer algorithm is blastp, 
tblastn or another algorithm that utilizes string alignments. 

1 3 . The method of claim 6, further comprising the steps of: 

a) disrupting the ftmction of Gene X or its homolog in a yeast cell; and 
15 . b) identifying whether the function of Gene X is essential for yeast 

germination, vegetative grov^h, pseudohyphal or hyphal growth. 

1 4. A method for identifying a potential inhibitor of a regulon target 
gene, comprismg the steps of: 

a) incubating a polypeptide comprising an amino acid sequence 
20 encoded by a regulon target gene with a compound under 

conditions effective to promote specific binding between the 
polypeptide and the compound; and 

b) determining whether the polypeptide bound to the compound; 
wherein the compound is a potential inhibitor if the compound binds to 

25 the polypeptide. 
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15. The method of claim 14, wherein the polypeptide comprises the 
filll-length amino acid sequence encoded by the regulon target gene. 

16. The method of claim 14, wherein the polypeptide comprises a 
functional fragment of the amino acid sequence encoded by the regulon target gene. 

5 17. The method of claim 14, wherein the polypeptide is a fusion 

protein comprising an epitope tag or reporter gene. 

1 8. The method of claim 14, wherein the polypeptide is attached to a 
solid support surface and the compound is in mobile phase. 

19. The method of claim 14, wherein the compound is attached to a 
10 solid support surface and the polypeptide is in mobile phase. 

20. The method of claim 14, wherein the compound is a library 
selected from the group consisting of a combinatorial small organic library, a phage 
display library and a combinatorial peptide library. ' 

21 . The method of claim 14, wherein said determining is performed by 
1 5 EUS A, RI A or BiaCORE analysis. 



22. The method of claim 14, wherein said determining is performed by 
high throughput screening. 



23. The method of claim 14, further comprising the step, performed 
before step a), of expressing in a host cell a regulon target gene. 



1 • 
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24. The method of claim 14, wherein the target gene is selected from 
the group consisting oiYMR134w, YERQ34w, YJLlOSw, YKL077w^ YGR046w, 
YJR041C, YER044C and YLRJOOw and their mammalian homologs. 

25. The method of claim 14, wherein the target gene is human EST 
5 W2823 5, a homolog of YER044c. 

26. The method of claim 14, wherein the target gene is human EST 
R92053, a homolog of YLRJOOw, 

27. The method of claim 14, wherein the target gene is mouse EST 
AI386195, a homolog of YER044c. 

10 28. The method of claim 14, wherein the target gene is rnouse EST 

AI2265 1 4, a homolog of YLRJOOw. 

29. The method of claim 14, wherein the target gene is mouse EST 
AI528381, a homolog of YLRJOOw. 

30. The method of claim 14, wherein the target gene is mouse gene 
15 3319971, a homolog of YLRJOOw. 

3 1 . The method of claim 14, wherein the target gene is rat gene 
1397235, a homolog of YLRIOOw, 

32. The method of claim 14, further comprising performing, before 
step a), the step of expressing in a host cell a regulon target gene selected from the 

20 group consisting ofYMRJ34w, YER034w, YJLJ05w, YKL077w, YGR046w, YJR04Jc, 

YER044C and YLRIOOw and their mammalian homologs. 



92 



wo 00/58521 



PCT/USOO/08604 



33 . A method for identifying a potential inhibitor of a regulon target 
gene, comprising the steps of: 

a) creating a host cell in which the target gene has been altered or 
inactivated by mutation; 
5 b) comparing gene expression profiles in the mutated host cell to 

those in a host cell which expresses the normal target gene; 

c) identifying one or more potential target-dependent reporter 
genes whose expression is altered in the host cell in which the 
target gene has been altered or inactivated compared to the host 

1 0 cell which expresses the normal target gene; 

d) screening one or more compounds for their effects on 
expression of the target-dependent reporter gene; 

wherein if expression of the target-dependent reporter gene increases in 
the host cell harboring an altered or inactivated target gene, then a potential inhibitor 
IS of the regulon target gene will increase expression of the target-dependent reporter 

gene, and if expression of the target-dependent reporter gene decreases in the host cell 
harboring an altered or inactivated target gene, then a potential inhibitor of the regulon 
target gene will decrease expression of the target-dependent reporter gene. 



34. The method of claim 33, further comprising the step, performed 
20 before step d), of assessing the specificity of a potential target-dependent reporter gene 
by comparing gene expression profiles the potential target-dependent reporter gene to 
a plurality of genes in a database of compiled gene expression profiles to generate 
individual expression correlation coefScients wherein a target-dependent reporter gene 
whose expression correlates with the expression of the regulon target gene and with a 
25 minimal number or no other gene is selected over one whose expression correlates 
with a greater number of genes based on expression correlation coefficients. 



35. The method of claim 33 or 34, wherein upstream sequences that 
control expression of the target-dependent reporter gene are fused to a heterologous 
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coding sequence and that fusion used to screen compounds for potential inhibitors of 
the regulon target gene. 

36. The method of claim 35, wherein the heterologous sequence 
comprises an epitope tag or a reporter gene. 

5 37. The method of claim 3 5, wherein the fusion polypeptide is attached 

to a solid support surface and the compound is in mobile phase. 

38. The method of claim 35, wherein the compound is attached to a 
solid support surface and the fusion polypeptide is in mobile phase. 

39. The method of claim 33, wherein the compound is a library 

10 selected from the group consisting of a combinatorial small organic library, a phage 
display library and a combinatorial peptide library. 

40. The method of claim 33, wherein said screening is performed by 
ELIS A, RIA or BiaCQRE analysis. 

41. The method of claim 33, wherein said screenmg is performed by 
15 high throughput screening. 

42. The method ofclaim 33, wherein the target gene is selected from 
the group consisting of YMRJ34w, YER034w, YJLlOSw, YKL077w, YGR046m\ 
YJR041C, YER044C and YLRlOOw and their mammalian homologs. 

44. The method of claim 33, wherein the target gene is human EST 
20 W2823 5, a homolog of YER044c. 
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45. The method of claim 33, wherein the target gene is human EST 
R92053, a homolog of YLRIOOw. 

46. The method of claim 33, wherein the target gene is mouse EST 
AI386195, a homolog of yE/?0-/-/c. 

5 47. The method of claim 33, wherein the target gene is mouse EST 

AI226514, a homolog of YLRIOOw. 

48. The method of claim 33, wherein the target gene is mouse EST 
AI528381, a homolog of YLRIOOw, 

49. The method of claim 33, wherein the target gene is mouse gene 
10 33 19971, a homolog of YLRIOOw. 

« 

50. The method of claim 33, wherein the target gene is rat gene 
1397235, a homolog of YLRIOOw.. 

51. A method for inhibiting the expression of a regulon target gene in a 
host cell comprising the step of introducing into the host cell an inhibitor made 

1 5 according tp any one of claims 

52. The method of claim 51, wherein the target gene is selected from 
the group consisting of YMRI34w, YER034w, YJLIOSw, YKL077w, YGR046w, 
YJR04IC, YER044C and YLRIOOw and their mammalian homologs. 

53. An antisense oligonucleotide comprising a sequence 

20 complementary to the sequence of an mKNA of a regulon target gene and effective to 
decrease transcription or translation of the gene. 
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54. The antisense oligonucleotide of claim 53 complementary to the 
sequence of the mRNA of a target gene selected from the group consisting of 
YMR134W, YER034W, YJLJOSw, YKL077w, YGR046w, YJR041c, YER044c and 
YLRIOOw and their manmialian homologs. 

5 55. A ribozyme comprising a sequence complementary to the sequence 

of an mRNA of a regulon target gene and effective to decrease transcription or 
translation of the gene. 

56. The ribozyme of claim 55 complementary to the sequence of the 
mRNA of a target gene selected from the group consisting of YMR J 34w, YER034w, 
10 YJLJOow, YKL077W, YGR046w, YJR041c, YER044c and YLRIOOw and their 
mammalian homologs. 

57. A neutralizing antibody to a protein encoded by a regulon target gene of a 
yeast or its mammalian homolog. 

58. The neutralizing antibody of claim 57, wherein the target gene is selected from 
15 the group consisting of YMR134w, YER034w, YJLlOSw, YKL077w, YGR046w\ 

YJR04Jc\ YER044C and YLRIOOw and their mammalian homologs. 

59. A fusion protein comprising an amino acid sequence encoded by a 
regulon target gene of a yeast or its mammalian homolog and further comprising an 
epitope tag or a reporter gene. 

20 60. The fusion protein of claim 59, wherein the target gene is selected 

from the group consisting of YMR134w, YER034w, YJLlOSw, YKL077w, YGR046w, 
YJR04JC, YER044C and YLRIOOw and their mammalian homologs. 
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61 . A method for identifying a gene regulated by a regulon target gene 
of a yeast or its mammalian homolog, comprising the steps of: 

a) overexpressing the target gene in host cells of a matrix 
comprising a plurality of units of cells, the cells in each unit 
containing a reporter gene operably linked to an expression 
control sequence derived from a gene of a selected organism; 
and 

b) identifying genes thafare either induced or repressed by 
overexpression of the target gene. 

62. The method according to claim 61, wherein the target gene is 
selected from the group consisting of YMRI34w, YER034w, YJLJOSw, YKL077w, 
YGR046W, YJR04IC, YER044c and YLRlOOw and their mammalian homologs. 

63 . A method for identifying a regulon indicator gene in a database of 
compiled gene expression profiles, wherein expression of the regulon indicator gene 
provides a measure of the function of a biological pathway or process of interest, the 
method comprising the steps of; 

a) examining exemplary expression profiles in response to one or 
more chemical or genetic treatments which target the pathway 
or process of interest to generate reporter sensitivity data; 

b) selecting a set of genes from a) which comprises one or more 
genes most significantly affected in response to the treatment or 
treatments; and 

c) selecting at least one gene from b) whose expression profile is 
maximized for its specificity and sensitivity to the treatment or 
class of treatments in a) compared to its sensitivity to all other 
treatments in the database. 
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64. The method of claim 63, wherein the regulon indicator gene is 
co-regulated with one or more genes in the set of genes of a). 



65. The method of claim 63, wherein the regulon indicator gene, upon 
expression, controls the expression of at least one other gene in the set of genes of a). 

5 
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YJL105W 



GenBank No. 



1008286 



Chromosome 



Protein 



S59 amino acids 



63,867 Daltons 



Comments: 



contains a PHD finger 



Figure 1. 
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CYB5 



Figure 2. 
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Regulated Expression of YJL105w 



Natutal 

Expt Level Log Ratio Tieatment (baseline] 



1455 


9 1 


+3 2 


1454 


8 1 


+3 1 


1537 


7 9 


+3 1 


1420 


7 8 


+3 1 


3455 


78 


+3 1 


3456 


78 


+3 1 


1944 


65 


+2 9 


1943 


64 


+2 9 


1554 


5 8 


+2 8 


1d1 0 

If ly 


5 9 




1553 


5.1 


+2.6 


3454 


5.1 


+2.6 


1538 


4.8 


+2.6 


1421 


4.4 


+2.5 


1541 


4.2 


+2.4 


1456 


4.1 


+2.4 


1539 


4.0 


+2.4 


1540 


4.0 


+2.4 


2756 


3.9 


+2.4 


2757 


3.8 


+2.3 


2061 


3.3 


+2.2 


1982 


3.0 


+2.1 


2060 


2.9 


+2.1 


1542 


Z8 


+2.0 


1999 


2.7 


+2.0 


3279 


2,7 


+2.0 


1935 


2.6 


+2.0 


1478 


2.5 


+1.9 


1477 


2.5 


+L9 


1983 


2.5 


+1.9 


3468 


2.5 


+1.9 


2754 


2.5 


+1.9 



4.0ug/inl Fluvastatin - 18 far [0.09] 
8.0ug/inl Fluvastatin - 18 far [0.13] 
20ug/iiil Lovastatin in 1 Ethanol - 18 hr [0.10] 
20ug/ml Atoivastatin in 1 DMSO - 1 8 hr [0. 1 4] 
20ug/ml Lovastatin - 1 8 hr [0.20] 
25ug/inl Lovastatin - 18 hr [0.20] 
30ug/inl Mevastatin in 1 .5 Ethanol - 1 8 hr [0.20] 
15ug/ml Simvastatin in 1.5 Ethanol - 18 hr [0.13] 
5ug/ml Simvastatin in 1 Ethanol - 18 hr [0. 12] 
30ug/ml Atorvastatin in 1 DMSO - 18 hr [0.12] 
lOug/ml Simvastatin in 1 Ethanol - 18 hr [0.11] 
lOug/ml Lovastatin - 18 hr [0.15] 
lOug/ml Lovastatin in 1 Ethanol - 18 hr [0.09] 
lOug/inl Atorvastatin in 1 DMSO * 18 hr [0.12] 
lOug/nU Mevastatin in 1 Ethanol - 18 hr [0.08] 
2.0ug/ml Fluvastatin - 18 hr [0.06] 
5ug/ml Lovastatin in 1 Ethanol - 18 hr [0.08] 
20ug/nil Mevastatin in 1 Ethanol - 18 hr [0.10] 
[hmgs - ABY244.1 regulated (60)] - 18 hr (0.21] 
[hmgs - ABY244.1 regulated (80)] - 18 hr [0.20] 
35ug/ml Atoivastatin in 1 Ethanol - 18 hr [0.08] 
0.125ug/ml Clotrimazole in 1 Methanol - 18 hr [0.19] 
25ug/ml Atorvastatin in 1 Ethanol - 18 hr [0.07] 
5ug/ml Mevastatin in 1 Ethanol - 18 hr [0,08] 
20ug/nil Atorvastatin in 1 Ethanol - 18 hr [0.08] 
0.15ug^ml Qotiimazoie in 1 DMSO - 18 hr [0.13] 
0.04ug/nil Econazole in 1 Methanol - 18 hr [0.18] 
2.0ug/ml Fluconazole in 0.9 Saline - 18 hr [0.27] 
3.0ug/ml Fluconazole in 0.9 Saline - 18 hr [0.31] 
0.15ug/nil Clotrimazole in 1 Methanol - 18 hr [0.15] 
20ug/ml Lovastatin (ABY139] - 18 hr [0.58] 
[hmgs - ABY244.1 regulated (20)] - 18 hr [0.19] 



Figure 3. 
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Wild-Type YJL105W Knockout 



Figure 4. 
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YMR134W 



GenBank No. 



606432 



Chromosome 



Protein 



236 amino acids 



Figure 5. 



27,911 Daltons 



Comments: 



involved in iron metabolism; potential transmembrane domain 
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Figure 6. 
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Treatments Causing Highest Expression of YMR134w 



Expeiiment Level log latio Treatment [baseline] 



1943 


1.3 


+1.8 


1944 


1.2 


+1.7 


1419 


1.2 


+1.7 


1537 


1.2 


+1.7 


1454 


1.2 


+1.7 


1477 


1.0 


+1.5 


1553 


0.9 


+1.5 


1455 


0.9 


+1.5 


3455 


0.9 


+1.5 


3456 


0.9 


+1.5 


153S 


0.9 


+1.4 


3454 


0.9 


+1.4 


1478 


0.8 


+1.4 


1540 


0.8 


+1.3 


1420 


0.8 


+1.3 


1611 


0.8 


+1.3 


1554 


0.7 


+1.2 


3279 


0.7 


+1.2 


3469 


0.7 


+1.2 


1605 


0.7 


+1.2 


1936 


0.7 


+1,1 


3468 


0.7 


+1.1 



15ug/ml Simvastatin in 1.5Ethanol- 18 hr [0.13] 

30ug/inl Mevastatin in 1.5 Ethanol - 18 hr [0.20] 

30ug/inl Atoivastatinin 1 DMSO - 18 hr [0.12] 

20ug/ml Lovastatinin 1 Ethanol - 18 hr [0.10] 

8.0ug/ml Fluvastatin • 18 hr [0.13] 

3.0ug/ml Ruconazole in 0.9 Saline - 18 hr [0.31] 

lOug/ml Simvastatin in 1 Ethanol - 18 hr [0.1 1] 

4.0ug/ml Ruvastatin - 18 hr [0.09] 

20ug/ml Lovastatin - 18 hr [0.20] 

25ugM Lovastatin - 18 hr [0.20] 

lOug/ml Lovastatin in 1 Ethanol - 18 hr [0.09] 

lOug/ml Lovastatin - 18 hr [0.15] 

2.0ug/ml Fluconazole in 0.9 Saline - 18 hr [0.27] 

20ug/nil Mevastatin in 1 Ethanol - 18 hr [0.10] 

20ug/nil Atoivastatin in 1 DMSO - 18 hr [0.14] 

lOug/ml Fluconazole - 21 hr [0.04] 

5ug/ml Simvastatin in 1 Ethanol - 18 hr [0.12] 

0. 15ug/ml Qotrimazole in 1 DMSO - 18 hr [0.13] 

25ug/nil Lovastatin [ABY139] - 18 hr [0.57] 

5ug/ml Fluconazole - 21 hr [0.04] 

0.05ug/ml Econazole in 1 Methanol - 18 hr [0.14] 

20ug^ Lovastatin [ABY1391 - 18 hr [0,58] 



Figure 7. 
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YER044C 



GenBank No. 



603277 



Chromosome 



Protein 



148 amino acids 



17,140 Daltons 



Comments: 



unknown function; potential transmembrane domain 



Figure 9- 
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YER044C 



ERG2 



Figure 10. 
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Treatments Causing Highest Expression of Y£R044c 



Experiment Level log ratio Treatment [baseline] 



1419 


4 


.2 


+1.7 


30ug/ml Atorvastatin in 1 DMSO - 18 hr [0.12] 


1420 


3 


.6 


+1.5 


20ug/ml Atorvastatin in 1 DMSO - 18 hr [0.14] 


1617 


3 


.3 


+1.4 


20ug/ml Fluconazole - 21 hr (0.04) 


1454 


3 


.2 


+1.4 


B.Oug/ml Fluvastatin - 18 hr [0.13] 


1537 


3 


.1 


+1.4 


20ug/nil Lovastatin in 1 Ethanol - 18 hr [0.10] 


1943 
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.0 


+1.3 


15ug/inl Simvastatin in 1.5 Ethanol - 18 hr [0.13] 


1623 
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.0 


+1,3 


lOOug/ml Fluconazole - 21 hr [0.04] 


3456 
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.0 


+1.3 
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lOug/ml Simvastatin in 1 Ethanol - 18 hr [0.11] 
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Sug/ml Fluconazole - 21 hr [0.04] 
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+1.1 
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+1.1 
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Figure 11. 
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Mouse EST with similarity to YER044c 



gblAI386195|AI386195 mq60h05.yl Scares 2NbMT Mus musculus cDNA clone 

IMAGE: 583161 5' similar to SW: YEN4_YEAST P40030 HYPOTHETICAL 
17.1 KD PROTEIN YER044c. xnRNA sequence 
[Mus musculus] 
Length = 455 

Score = 81.5 bits (198), Expect =» 6e-15 

Identities = 40/114 (35%), Positives = 68/114 (59%) 

Frame = +3 

Query: 23 LPKWLLFISIVSVFNSIQTYVSGLELTRKVYERKPTETTHLSARTFGTWTFISCVIRFYG 82 

L WL+ +SI+++ N++Q++ L K+Y KP L ARTFG WT +S VIR 

Sbjct: 93 LRSWLVMVSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRCLC 272 

Query: 83 AMYLNEPHIFELVFMSYMVALFHFGSELLIFRTCKLGKGFMGPLWSTTSLVWM 136 

++ ++ + ++++AL HF SEL +F T G + PL+V++ S++ M 

Sbjct: 273 AIDIHNKTLYHITLWTFLLALXHFLSELFVFGTAAPTVGVLAPLMV7VSFSILGM 434 



Human EST with similarity to YER044c 

gblW282351W28235 43h8 Human retina cDNA randomly primed sublibrary 

Homo sapiens cDNA. 
Length = 839 

Score = 69,9 bits (168), Expect = 2e-ll 
Identities 33/94 (35%), Positives = 55/94 (58%) 
Frame = +1 

Query: 23 LPKWLLFISIVSVFNSIQTYVSGLELTRKVYERKPTETTHLSARTFGTWTFISCVIRFYG 82 

L WL+ +SI+++ N++Q++ L K+Y KP L ARTFG WT +S VIR 

Sbjct: 112 LRSWLVMVSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRCLC 291 

Query: 83 AMYLNEPHIFELVBMSYMVTVLFHFGSELLIFRTC 116 

A+ ++ ++ + ++++AL HF SEL + C 
Sbjct: 292 AIDIHNKTLYHITLWTFLLALGHFLSELFVLWNC 393 



Figure 13 
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Rat EST with similarity to YER044c 



gb|AI172515|AI172515 UI-R-C2p-nu-d-02-0-UI.sl UI-R-C2p Rattus 

norvegicus cDNA clone UI-R-C2p-nu-d-02-0-UI 3', mRNA 
sequence [Rattus norvegicus] 
Length = 475 

Score - 80.8 bits (196), Expect = le-14 

Identities = 40/114 (35%), Positives = 68/114 (59%) 

Frame = -3 

Query: 23 LPKWLLFISIVSVFNSIQTYVSGLELTRKVYERKPTETTHLSARTFGTWTFISCVIRFYG 82 

L WL+ +SI+++ N++Q++ L K+Y KP L ARTFG WT +S VIR 

Sbjct! 4 04 LRSWLVMVSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRCLC 225 

Query: 83 AMYLNEPHIFELVFMSYMVALFHFGSELLIFRTCKLGKGFMGPLWSTTSLVWM 136 

A+ ++ ++ + ++++AL HF SEL +F T G + PL+V++ S++ M 

Sbjct: 224 AIDIHNKTLYHITLWTFLLALGHFLSELFVFGTAAPTVGVLAPLMVASFSILGM 63 



Figure 14. 
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YLRlOOw 



GexiBank No. 



1360483 



Chromosome 



XII 



Protein 



347 amino acids 



Figure 15. 



39,725 Daltons 



Comments: 



unknown function; see S. Huang at al.. Biochemistry, 26, pp. 
8242-46(1987) 



15 /SB 



SUBSTITUTE SHEET (RULE 26) 



wo 00/58521 



PCT/USOO/08604 



YLRlOOw 



^ if* 



Figure 16. 
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Treatments Causing Highest Expression of YLRlOOw 



Experiment Level Treatment [baselxne] 

6092 8.3 20ug/ml Lovastatin in 1 Ethanol [ABY12.1] - 24 hr [0.15] 
8717 6.7 lOug/ml Simvastatin in 1 DMSO [ABY12.1] - 24 hr [0.14] 

6093 6.3 lOug/ml Lovastatin in 1 Ethanol [ABY12.1) - 24 hr [0.16] 
8716 6.1 7.5ug/ml Simvastatin in 1 DMSO [ABY12.1] - 24 hr'[0.13] 
8715 4.9 5ug/ml Simvastatin in 1 DMSO [ABY12.1] - 24 hr [0.12] 

6094 4.4 5ug/ml Lovastatin in 1 Ethanol [ABY12.1] - 24 hr [0.13] 
8705 2.7 [ergll - ABY210 regulated (100)] - 24 hr [0.17] 

6088 2.6 0. lug/ml Sulconazole in 1 DMSO (ABY12.1] - 24 hr [0.12] 

8341 2.5 0.025ug/ml Miconazole in 1 DMSO [ABY12.1] - 24 hr [0.15] 

8460 2.4 0. lug/ml Clotrimazole in 1 DMSO [ABY12.1] - 24 hr [0.12] 

8462 2.3 0.135ug/ml Clotrimazole in 1 DMSO (ABY12.1] - 24 hr [0.17] 

8461 2.3 0.12ug/ml Clotrimazole in 1 DMSO (AByi2.1] - 24 hr [0.14] 

8342 2.3 0.03ug/ml Miconazole in 1 DMSO [ABY12.1] - 24 hr [0.19] 
8703 2.1 [ergll - ABY210 regulated (80)] - 24 hr [0.14] 

8340 2.0 0.02ug/ml Miconazole in 1 DMSO [ABY12.1] - 24 hr [0.12] 

8463 2.0 0.15ug/ml Clotrimazole in 1 DMSO [ABY12.1] - 24 hr [0.25] 
8701 1.9 [ergll ~ ABY210 regulated (60)] - 24 hr [0.14] 



Figure 17. 
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Alignment of YLRIOOw to Mammalian ESTs 

gblAI226514 IAI226514 uj07d08.yl Sugano mouse liver mlia Mus znusculus. cDNA 
clone 

IMAGE: 1891215 5' sindlar to TR:Q62904 Q62904 
OVARIAN-SPECIFIC PROTEIN. mRNA sequence [Mus 
musculus] Length - 1039 

Score = 63.2 bits (151), Expect = 5e-09 

Identities = 53/223 (23%), Positives = 108/223 (47%), Gaps = 11/223 (4%) 

Query; 3 RKVAIVTGTNSNLGLNIVFRLIETEDTNVRLTIVVTSRTLPRVQEVINQIKDFYNKSGRV 62 

RKV ++TG +S +GL + RL+ +D L + +RL + + V+ + + + 
Sbjct; 52 RK\A;LITGASSGIGLALCGRLLAEDDD LHLCLACRNLSKARAVRDTLLASHPSA 213 

Query: 63 EDLEIDFDYLLVDFTNMVSVLNAYYDINKKYRAINYLFVNAA QGIFDGIDW 113 

+ + +D +++ SV+ ++ +K++ ++YL++NA + F GI + 

Sbjct: 214 EVSIVQMDVSSLQS\AmGAEEVKQKFQRLDYLYLNAGILPNPQFNLKAFFCGI-F 375 

Query: 114 IGAVKEVFTNPLEAVTNPTYKIQLVGVKSKDDMGLIFQANVFGPYYFISKILPQLTRGK- 172 

V -l-FT E + + G++ -fF^ N+FG + I ++ P L 

Sbjct: 376 SRNVIHMFTTA-EGILTQNDSVTADGLQE VFETNLFGHFILIRELEPLLCHADN 534 

Query: 173 -AYIVWISSIMSDPKYLSLNDIELLKTNASYEGSKRLVDLLHLATYKDLKKLGI 225 

+ ++W SS + ' SL DI+ K Y + DLL++A ++ K G+ 
Sbjct: 535 PSQLIWTSSRNAKKANFSLEDIQHFKGPEPYSSFQYATDLLNVAXNREFKPEGL 696 

gbtAI528381|AI528361 ui96g06.yl Sugano mouse liver mlia Mus musculus cDNA 
clone 

IMAGE: 1690298 5* similar to TR:Q62904 Q62904 
OVARIAN-SPECIFIC PROTEIN. mRNA sequence [Mus 
musculus] Length - 837 

Score = 52.3 bits (123), Expect = le-05 

Identities = 59/260 (22%), Positives = 119/260 (45%), Gaps = 11/260 (4%) 

Query: 3 RKVAIVTGTNSNLGLNIVFRLIETEDTNVRLTIWTSRTIiPRVQEVINQIKDFYNKSGRV 62 

RKV ++TG +S +GL + RL+ +D L++RL++V++ ++ 
Sbjct: 52 RKWLITGASSGIGLALCGRLLAEDDD LHLCLACRNLSKARAVRDTLLASHPSA 213 

Query: 63 EDLEIDFDYLLVDFTNMVSVLNAYYDINKKYRAINYLFVNAA QGIFDGIDW 113 

+• .+ +D +++ SV+ ++ +K++ ++YL++NA + F GI + 

Sbjct: 214 EVSIVQMDVSSLQSWRGAEEVKQKFQRLDYLYXiNAGILPNPQFNLKAFFCGI-F 375 

Query: 114 I6AVKEVFTNPLEAVTNPTYKIQLVGVKSKDDMGLIFQANVFGPYYFISKILPQLTRGK- 172 

V +FT E + + + D + +F+ N+ + I ++ P L 

Sbjct: 376 SRNVIHMFTTA-EGILTQNDSV TADRLQEVFETNLSCHFILIRELEPLLLHADN 534 

Query: 173 -AYIVWISSIMSDPKYLSLNDIELLKTNASYEGSKRLVDLLHLATYKDLKKLGINQYWQ 231 

+ ++W SS + SL D + Y + DLL++A + + G+ + 

Sbjct: 535 PSQLIWTSSRNAXKANFSLEDXQHSIGPGPYSSFQYATDLLNVALNXNXNQKGLYSSRMC 714 

Query: 232 PGIFTSHSFSEYLNFFTYFGMLCLFYLARLL 262 

PG+ ++ TY G+L FYL LL 

Sbjct: 715 PGWMTN MTY-GILPPFYLDVLL 780 



Figure 19. 
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gb|R92053|R92053 yp96c01.rl Homo sapiens cDNA clone 195264 5*. Length = 454 
Score =44.1 bits (102), Expect = 0.003 

Identities = 26/84 (30%), Positives = 40/84 (46%), Gaps = 2/84 (2%) 
Frame = +1 

Query: 150 FQANVFGPYYFISKILPQLTRGK—AYIVWISSIMSDPKYLSLNDIELLKTNASYEGSKR 207 

F+ NVFG + I ++ P L + ++W SS + SL D + K Y SK 

Sbjct: 1 FETNVFGHFILIRELEPLLCHSDNPSQLIWTSSRSARKSNFSLEDFQHSKGKEPYSSSKY 180 

Query: 208 LVDLLHLATYKDLKKLGINQYWQPG 233 

DLL +A ++ + G+ V PG 
Sbjct: 181 ATDLLSVALNRNFNQQGLYSNVACPG 258 

Figure 19 (cont). 
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YER034W 

GenBank No. 603267 
Chromosome V 

I 

Protein 1 8S amino acids 

21,186 Daltons 

Comments: unknown fimction; see S. Huang et al.. Biochemistry, 26, pp. 

8242-46 (1987) 



Figure 20 
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1 



YER034W 




GPA2 



Figure 21. 
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Mutation of the YER034w Gene Leads 
to Increased Pseudohyphal Growth 




Wild Type yer034w A 

Figure 22. 
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YKL077W 



GenBank No. 



486110 



Chromosome 



XI 



Protein 



392 amino acids 



Figure 23 



46,042 Daltons 



Comments: 



unknown function; potential transmembrane domain 
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YKL077W 




SGVl 



Figure 24. 
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Expression Correlation of YKL077w 



Rank Gene 



Correlation 



Exp 



Function 



1 


YKL077W 


+1.00 


0. 


5 


- 9.1 




2 


SGVl 


+0.92 


0. 


7 


- 14.4 


CDC28/cdc2 related protein kinase 


3 


RHOl 


+0.88 


1. 


3 


- 20.9 


GTP-binding protein 


4 


YKL075C 


+0.86 


0. 


2 


- 2.5 




5 


SRA3 


+0.84 


0. 


3 


- 4.6 


catalytic subunit of PKA 


6 


RPB4 


+0.84 


0. 


3 


- 7.8 


subunit of RNA polymerase II 


7 


PKCl 


+0.84 


0. 


6 


- 11.7 


putative protein kinase 



Figure 25. 
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YGR046W 



GenBank No. 



1323049 



Chromosome 



vn 



Protein 



385 amino adds 



Figure 27 



44,219 Daltons 



Conmients: 



essential gene in yeast 
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YGR046W 



IRA2 



Figure 28. 
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YJR041C 



GenBank No. 



1015693 



Chromosome 



Protein 



1 173 amino acids 



135,096 Daltons 



Comments: 



essential gene in yeast; contains a leucine zipper; potential 
transmembrane domain 



Figure 3 1 
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HESl 



GenBankNo. 



1420543 



Chromosome 



XV 



Protein 



433 amino adds 



49,502 Daltons 



Comments: 



implicated in ergosterol pathways; related to human oxysterol 
binding protein 



Figure 35 
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homology to human oxysterol binding protein 


ERG2 
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0.1 
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C-8 sterol isomerase 


PAU5 
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member of seripauperin protein/gene family 
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lanosterol synthase 


CYB5 
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cytochrome b5 


YJLlOSw 


+0.81 


0.1 
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similar to Ykr029p 
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+0.79 


0.3 




3.7 
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ERGll 
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cytochrome P450 lanosterol 14a-demethylase 


HEM14 
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protoporphyrinogen oxidase 
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+0.76 


0.8 
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squalene synthetase 


TIRl 


+0.74 
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cold-shock induced - serine-alanine-rich 


ERGS 


+0.70 
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6.0 


phosphomevalonate kinase 
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+0.69 


0.5 




9.6 


SAM: delta 24-methyltransf erase 



Figure 36. 
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FIGURE 39. YJLlOSw DNA Sequence 

Sequence contains 1200bp of 5' promoter sequence. 
Symbols: 1 to: 2883 from: chrlO.gcg ck: 4711 

223552 to: 226434 ' 
Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 
Galibert, F., Alexandraki, D., Baur, A., Boles, E., 
Chalwatzis, N,, Chuat, J. C, Coster, F., Cziepluch, C, De 
Haan, M., Domdey, H,, . . , 

gcgseq.tmp.4454 Length: 2883 March. 26, 1999 16:51 Type: 
N Check: 6274 

1 TGGAAAAGCT CACTGTGAGG TTCCTTGGAG CCAATAGTAA TACAGCACAA 
51 TCCAAGGAAA AATCTGGCCT ATATGCAAGG AAGGAGAGAT AGTCAAAAGC 
101 ATTCTTTCCC CTAGAAGTTG GTGCATATAT GGCATCGTTA AAACATATTA 
151 CCCCCAAAAT TTCTTCTCTA AACGATGTGC TTGGCCTTTG TTTTGGTTTT 
201 TGATGTCGGT CGTTTGAGGC CCCTTGCGGA AAATCGAGAT CGCCGAATGG 
251 CACGCGAGGG AAGGGAAATA AGGTTTAAAG GCACTGAAAC AATAGGCAAG 
301 AAGTAGGCGA GAGCCGACAT ACGAGACTAA TGTGTCCGCG TTTCTAAGGC 
351 CACTTTTCAA TGAAACGGAT ATTGATATGC TAGTAAAAGG ACGAGCTCAA 
4 01 GAGCGAAAAT ATAAGTAAAG AATTCGAGTG CACTTGTCTC CATGCAGCAA 
451 GATTTCATAT GAGTCTTTTT TATCTTTTTA CTTTTTACAT TACACGATAT 
501 GCACTTTATG AAAATTTAAC GAGGTTGGAA GCCGGATAAT CAACCAAAAT 
551 CAGGCACGAA GGCACACTCG TATATGCATG TTGTTGAAAC TCTGTTACGC 
601 TGAACTAACA ATCACACATG TAGAGGTCAC CGGGAAAAGT TGCGACCCCA 
651 TGGAAGGTGG ATCTCTTCGT TTGGCTTTGC TTGGCTGGCG GCATTGCGCT 
701 TCTTCGCTTA TACCCGTCTC TTGACGCTCG AGCTCGTTCA TTGAGATACC 
751 TTTATTCTTG CACATTTTCT GGCTTTTTTC GCTACTCGGG TACATGTAAT 
801 CATGCACACA GAAGGTGCTG TAGGGTGAAA GTTCCTTTGT GCTGTCGTTT 
851 GTTTTTAATG CCAAACTTTC TGGTGATCAA TAACCACCTC TTTTTCCTTC 
901 AGGAAACCTT ATTATTGTTC TTGGATAGTA CTAGGAAGTA TATAAGGAAC 
951 CTCGATTTTG GTATTGCACG GCTATACACA TCTAAGAAAC TTTGTATAAA 
1001 AGGTGGCTAC CCTATTCATA GCTTGATATC AATAGGCCAT CTCATCACTT 
1051 TTTATTGAAA AGGAAAGGAG GGAAATATAT CTGATTCAAA TTACTTGTTT 
1101 GCTTCTCTTT AAGACAAAAG CATAGATAAT TTCAGCGTGG AACGCCGGAA 
1151 TAAGATTGGT ACCCTCGTCA GAAAGTTACA AATACCGCTT CATCTTCAAA 
1201 ATGACTTCAC CGGAATCACT ATCTTCTCGT CATATCAGGC AAGGAAGGAC 
1251 ATACACAACC ACAGACAAGG TCATATCGCG GTCGTCGTCG TACTCATCTA 
1301 ATAGTTC7\AT GTCTAAAGAT TACGGCGATC ACACACCCTT GTCCGTCAGC 
1351 AGTGCAGCTT CAGAGACATT ACCCTCACCT CAGTATATGC CGATAAGGAC 
1401 ATTCAATACA ATGCCTACAG CTGGCCCAAC GCCTTTACAT TTATTTCAAA 
1451 ATGACAGGGG CATTTTCAAC CATCATTCTT CATCAGGCTC ATCAAAAACG 
1501 GCATCAACAA ATAAAAGAGG AATAGCAGCA GCAGTAGCAT TGGCAACTGC 
1551 TGCCACCATA CCATTTCCAC TGAAAAAACA GAATCAAGAT GATAATTCCA 
1601 AGGTCTCGGT AACACACAAT GAATCATCGA AAGAAAATAA AATTACACCC 
1651 TCCATGAGAG CAGAAGATAA CAAACCTAAA AATGGTTGCA TCTGCGGTTC 
1701 AAGTGACTCC AAGGATGAGT TGTTTATACA GTGTT^CAAA TGTAAAACGT 
1751 GGCAGCACAA GTTATGTTAT GCTTTCAAAA AATCAGATCC AATAAAAAGA 

42/88 

SUBSTITUTE SHEET (RULE 26) 



wo 00/58521 



PCTAJSOO/08604 



1801 GATTTTGTTT 

1851 AGTAAAACCA 

1901 AATTTTCATC 

1951 CAGTCTGTGA 

2001 TACCGCCCCA 

2051 AAGAAAAACT 

2101 GTAAGTTCTT 

2151 TAAGGACAJVA 

2201 GGGTTGTTTG 

2251 AGAAAATCAT 

2301 TGTTAAAGGT 

2351 AAAATTATCA 

2401 AAACCTAAAG 

2451 AGAAACAGGC 

2501 TGGAACTAGT 

2551 GATTGTAGAG 

2601 AGAAGAGATA 

2651 AGATAATAAA 

2701 TTCTGGTTGA 

2751 ATGTGGGTAC 

2801 CTGAAGAATT 

2851 TTTAATACAA 



GCAAAAGATG 
ATGATATTCC 
CATAGTGACA 
ATAACATAGA 
ACAACTGAAA 
GGTAGTATCA 
CCAATGACAC 
TATGTTAAGA 
TTCTAACTGG 
CAAATGAAAG 
GAGCTAATTC 
GACAGATCCA 
TACTTTTTCA 
GGATTAACAA 
AACGGTAAGA 
TTAAATTTGT 
AGCGTAGAAT 
TGCATCTAAA 
TGGGGTCAAT 
TTGGGCCATA 
CATGAGGAAT 
TAATGCACAA 



TGACAGTGAT 

CTAGAAAAAT 

ACTTCAGCAT 

GGAACAGCCC 

ATAGCAATAG 

AGCCACTTTC 

GGAATTCAAA 

TGTTTATTGA 

GAAAGCTCAA 

AGATTTTGGA 

AAGAATATTT 

AATAATGACT 

TCCACATTGG 

GATACATAAG 

CCGCTTGACG 

TTTAAGGGCT 

GGCAATGGGA 

GATTTGGATT 

AAAGACTATT 

ATTGTCCAAT 

ACGAAGGAAT 

CTGTAAGCCA 



ACGAAAGTGC 

GGGAGATGAG 

CGAACACAAA 

AAGAAACGTC 

TATACGGAAA 

TGAAGCCACT 

GCAATAACAA 

TAACCATTAT 

GGTCAGCTGA 

GTCTTCGCTG 

GGGCAAAATT 

ATCGTTTGAT 

CCTTTATATA 

ACGGAGTTGT 

AAA?^CCAAG 

ATAAGAGATA 

TTTGAGAAAT 

CCCTACCGGA 

TTAACAAATT 

AACTAAAATC 

CCCTATCTAA 

TAA 



AGGTTAATCA 
CGATTATTTC 
TCAGCATCAA 
AACTTCATTA 
AAATTGAGGC 
ACTGAATGAG 
TATCAGAGTA 
GATGACGATT 
CATCGAGGTA 
CAGATTCTTG 
GATTTTCAAA 
GGGAACGACA 
TAGACTCTCG 
GAGCCCAATG 
AGGAGATAAT 
TTCGTAAGGG 
CCTATTTGGG 
TCCCGACAAG 
GTGATTGTGC 
AAA7VACTTTT 
TAAATCTTAC 



FIGURE 39 (cont) • 
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FIGURE 40. yJLlOSir Protein Sequence 

EIMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide sequence of 
Saccharomyces cerevisiais chromosome X. 

Galibert, F., Alexandraki, D, , Baur^ A., Boles, E., Chalwatzis, N., 

Chuat, J. Coster, F., Cziepluch, C, De Haan, M. , Domdey, H., 

Durand, P., Entian, K. D., Gatius, M., Goffeau, A., Grivell, L. A., et al. 

YJL105W Length: 560 March 26, 1999 16:52 Type: P Check: 103 

1 MTSPESLSSR HIRQGRTYTT TDKVISRSSS YSSNSSMSKD YGDHTPLSVS 

51 SAASETLPSP QYMPIRTFNT MPTAGPTPLH LFQNDRGIFN HHSSSGSSKT 

101 ASTNKRGIAA AVALATAATI PFPLKKQNQD DNSKVSVTHN ESSKENKITP 

151 SMRAEDNKPK NGCICGSSDS KDELFIQCNK CKTWQHKLCY AFKKSDPIKR 

201 DFVCKRCDSD TKVQVNQVKP MIFPRKMGDE RLFQFSSIVT TSASNTNQHQ 

251 QSVNNIEEQP KKRQLHYTAP TTENSNSIRK KLRQEKLWS SHFLKPLLNE 

301 VSSSNDTEFK AITISEYKDK YVKMFIDNHY DDDWWCSNW ESSRSADIEV 

351 RKSSNERDFG VFAADSCVKG ELIQEYLGKI DFQKNYQTDP NNDYRLMGTT 

401 KPKVLFHPHW PLYIDSRETG GLTRYIRRSC EPNVELVTVR PLDEKPRGDN 

451 DCRVKFVLRA IRDIRKGEEI SVEWQWDLRN PIWEIINASK DLDSLPDPDK 

501 FWLMGSIKTI LTNCDCACGY LGHNCPITKI KNFSEEFMRN TKESLSNKSY 

551 FNTIMHNCKP 
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nODBE 41. SMR234V DNA Sequence 

Sequence contains 1200bp of 5' promoter sequence. 

Symbols: 1 to: 1914 from: chrl3,gcg ck: 8335, 536637 to: 538550 

Chromosome XIII Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XIII . 

Bowman, S., Churcher, C, Badcock, K., Brown, D., Chillingworth, T., 
Connor, R. , Dedman, K«, Devlin, K., Gentles, S., Hamlin, N., Hunt, S., • • . 

gcgseq.tmp, 31828 Length: 1914 March 26, 1999 16:58 Type: N Check: 3324 



1 TAC7\ATAACA AGCCAGGTGC AAGGCAATAA TAACGGTACA AAGGTCTGTT 

51 TCACAGAAGG TCCAAAAGTT AGTAGCTACA CAAATCCGAA CACGCAATTT 

101 CAAACTCAAA ACATGATTAT GGATTTCAGT CAACGTTATC AGGAAGAATC 

151 TGAAAGAGAG TCAAATAATC GTTCAAATAT AACTTTACCA CACGACAGCA 

201 TTCAAATAGC TCAACAAATA TGGCCAAACA CGGATTTAAA TGTAGTACAA 

251 TCTTCACAAG ACCTCAACAC TCCAATGGCT ACGCAAACT6 TTTTGGGTCG 

301 TCCTGAGTCG CTAATTGTAC AGCCATTGGA GGTTTCTCAA TCTCCACCAA 

351 ACACTACCAA CTGCCTTCCT AATGCAGAAA ACAAAAAGAA AAAAGTCGAC 

401 ACCACTTCTG ATTTTACTTC AAGAAAGGAG ATTGCTCTGT GTAAAACTGG 

451 TTTATTAGAA ACTATTCATA TACCAAAGGA AAGGGAAAGT CAGATGCAAA 

501 GCGTCACTGG TTTAGATGCA ACACCAACGA TTATATGGAG CCCCGGGAAA 

551 GACAACACGG CGAAGAAAAA TACCAGTAAT AAGAAA?U^TA TTGATGATAA 

601 ACTAACAAAC CCCCAAAAAT CTGGAAATAC ACATACCCCT GATAGAAATA 

651 AAGAAGTGCT ACCTAACGGC ACACTTAATG AAACGAGGAA AGAAGCATCG 

701 CCAAGCGAAG GATTAACGAT AAGAGTTAAA AACGTTAATC GGAATGCGTC 

751 AAGAAAAATA TCTAAGCGGC TAATCAAGGA AAAGTTGAAA GACGAAGAAT 

801 TCATGAAATG GGTATGTATG CATTTGCAAG AAACTGAGCT GTTTCCCCCT 

851 CTTATCCACT CATTTTCTCT GACTTGACAA AGAAATACTA ACTAACAACT 

901 TTTGCCACTA CAAATATGAA TGAAAAGGTT AATAAGGTTG AAACGGTTCT 

951 CAATAAAATG TTCGAAAAGT GAACCCTTTT TTTGCAATTC CTTTTTACAC 

1001 TAGCCACGAA GTAAAATGGA AAAGTAAACC CGAGTTTCGG CAATATCGCT 

1051 AAGCAAGAAG AGCAAGCTCG TTTAAGTAAG CCTTTATGAA AAAAAAACAA 

1101 AATATAAAGC ATTATAT^AAA TTGAATCACA TCGCAAATCT GCAATATACT 

1151 TGGAAGTGTT TATAGCAAAG TGTGGTATAG AAAAAGAACC AAAGGCCGGT 

1201 ATGTCGTTAA AGGATAGGTA TCTAAATCTC GAATTAAAAT TAATAAATAA 

1251 ACTACAGGAG TTGCCATATG TTCATCAATT TATCCATGAT CGAATAAGTG 

1301 GTAGGATAAC TCTCTTTTTG ATAGTGGTTG GTACGCTTGC ATTTTTTAAC 

1351 GAACTGTATA TAACGATCGA AATGAGTCTT CTACAAAA6A ACACATCAGA 

1401 AGAACTAGAG CGTGGAAGAA TCGATGAAAG TCTGAAGCTT CATCGGATGT 

1451 TGGTGAGTGA TGAATATCAC GGTAAAGAAT ACAAAGACGA GAAAAGCGGT 

1501 ATTGTTATTG AAGAGTTCGA AGATCGCGAT AAGTTTTTTG CAAAACCTGT 

1551 GTTTGTATCA GAATTGGATG TCGAATGTAA TGTTATTGTA GATGGGAAAG 

1601 AACTTCTGTC CACCCCATTA AAATTTCATG TTGAATTTTC TCCAGAGGAT 

1651 TATGAAAATG AAAAAAGACC TGAGTTTGGT ACTACCTTGC GTGTATTGAG 

1701 GCTGAGACTT TACCACTACT TTAAAGATTG CGAAATATAT CGCGATATAA 

1751 TTAAGAATGA GGGCGGTGAA GGGGCAAGAA AGTTTACGAT TTCCAACGGT 

1801 GTCAAAATTT ACAATCATAA AGATGAACTA CTGCCATTGA ATATCGATGA 

1851 TGTTCAATTA TGTTTCCTGA AGATTGATAC GGGAAACACG ATAAAATGCG 

1901 AATTCATACT ATGA 
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FIGURE 42. YMR134W Protein Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XIII. 

Bowman, S., Churcher, C, Badcockr K., Brown, D./ Chillingworth, T., 
Connor, R., Dedman, K., Devlin, K., Gentles, S., Hamlin, N., Hunt, S., 
Jagels, K., Lye, G,, Moule, S., Odell, C, Pearson, 0., Rajandream, et al. 

• YMR134W Length: 237 March 26, 1999.16:59 Type: P Check: 2966 
1 MSLKDRYLNL ELKLINKLQE LPYVHQFIHD RISGRITLFL IWGTLAFFN 
51 ELYITIEMSL LQKNTSEELE RGRIDESLKL HRMLVSDEYH GKEYKDEKSG 
101 IVIEEFEDRD KFFAKPVFVS ELDVECNVIV DGKELLSTPL KFHVEFSPED 
151 YENEKRPEFG TTLRVLRLRL YHYFKDCEIY RDIIKNEGGE GARKFTISNG 
201 VKIYNHKDEL LPLNIDDVQL CFLKIDTGNT IKCEFIL 
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FIGUBE 43 « YER044a Sequence 

Sequence contains 1200bp of 5* promoter sequence 



Symbols : 



1 to: 1647 from: chrS.gcg /rev 



ck: 9036, 237569 to: 239215 



Chromosonie V Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, S., Mulligan, J., Hennessy, K., Yelton, M. A., Allen, E., 
Araujo, R., Aviles, E., Berno, A., Brennan, T., Carpenter, J., Chen, . . 

gcgseq.tmp,2512 Length: 1647 March 26, 1999 16:38 Type: N Check: 8794 



1 AACACTCCAA 

51 ATTCTTCTTT 

101 GCCAAGTTTC 

151 AAAATGGCTT 

201 ATAAAGTTGA 

251 TCTATTATTA 

301 ACCGCAGATA 

351 TTATCATCAT 

401 CATAAACTAA 

451 CGAAAGAGAA 

501 TAATAAGCGG 

551 AAGATAGTTA 

601 GCACTTAAAA 

651 AGAAGAAAAG 

701 AATTTGATAG 

751 CATTTAATTT 

801 TCGAAAGCTC 

851 GAACACAACA 

901 TTGCCCATGA 

951 TTACCGATAG 

1001 GAACAGGAAA 

1051 ATATAAAACT 

1101 ATTTTTGACG 

1151 CTAGCTAGAC 

1201 ATGTTCAGCC 

1251 AATGCCAAAA 

1301 CAGTCTTCAA 

1351 AAAGTCTACG 

1401 TTTCGGTACT 

1451 TGTACTTGTVA 

1501 GTTGCCCTAT 

1551 GTTGGGAAAG 

1601 TTTGGATGTA 



ATCTTGTTAG 
TAAGAGGACA 
TGTTTAATAG 
CAAAACTTTT 
CTACAAGCGC 
ACATTCAACT 
TTAAAATACG 
CGAAAGTCTG 
TATGCATCAA 
TTGAGTTTGA 
TGTCCCAGAA 
TTCTATCATT 
CCTTTCCTGG 
GAAATATATA 
CGCGCTTTCG 
TTCTTTTGCA 
TTTCATTTCG 
TCTTCCCCAA 
TTAACCTAAT 
GAAACTTCTA 
CAAAAAAAAA 
AAGCTGAACA 
TAAACGCATT 
CATAGTATCG 
TACAAGACGT 
GGTTACTTAC 
TTCTATCCAG 
AAAGAAAACC 
TGGACCTTTA 
TGAACCACAC 
TCCACTTCGG 
GGATTCATGG 
CAAACAAAGA 



TTTCTCATTA 

CTGATAGACG 

AATTTTATTG 

ACGACCAGGG 

TTGTGTTCGT 

CATCAAAATC 

TATCTGTTCT 

ATACATGTAC 

CCGGAAAAAG 

AGAAAACCTA 

AAAGAGTTAG 

GCTCGAGACA 

AAAAATTTCC 

CAGGCGGCCT 

ATTGTCAAGA 

GTAGGAGGCG 

GGGACAACAA 

CAGACCTACA 

CTTATACGAA 

TTTTATGATT 

GGTACGATCC 

AGCCTGTTGT 

GACTAATTTC 

AAGGATTCAA 

AATAACTACA 

CAAAATGGTT 

ACTTACGTTT 

CACTGAAACA 

TTTCCTGTGT 

ATTTTCGAAT 

CTCTGAATTA 

GTCCATTGGT 

GAATACTACA 



TTCGCATCGC 

TTCATGTTTT 

AAGAAGAACC 

AGATGGCAAA 

TGCATTTTAC 

AAGACAAACC 

GAAATTAATT 

TTATTAGATT 

GCGTACTGTC 

CTTAAAGAAC 

GGGGTCTATT 

TTTGAAAGCA 

GGCTCATGAA 

TATTAATTAC 

TGGTTCAATG 

CATTATAAAA 

CTTCAGTTGA 

TTAAAACGCT 

CTGAATTAAA 

TTTTCGTTCG 

ATTGTATTCT 

TTTGCTTTAC 

AGGTTTTTAT 

ATACACTAAA 

ACCAAGACCA 

ACTTTTCATT 

CTGGTTTAGA 

ACCCATTTGA 

TATCAGATTC 

TGGTCTTCAT 

TTGATCTTTA 

TGTCTCAACC 

CTGGTGTTGC 



ATAGATTCTG 

CAATTTCATC 

AAAACGATCC 

CATTTATGTG 

CCTTATTTAC 

AAACATTTGA 

GAACACATAC 

TGTATCGAAG 

GAGTATACCT 

TTTTACAGTG 

GAAAATACTC 

TTGAATGGCA 

ATATCGCTTC 

TGCCGAAAGA 

AGACAGAAAA 

CACAAAAATA 

AAATTACAGT 

TCTTCCGGAC 

CTTTACGGTA 

GGGACGGAAC 

CTACCCCCGT 

TATTGCTACT 

ATTCTTGACA 

GTATCAGATA 

CCTTGGCAGC 

TCCATTGTAT 

ATTGACACGT 

GTGCAAGAAC 

TATGGGGCTA 

GTCTTATATG 

GAACTTGTAA 

ACCTCTTTGG 

TTGGTAA 
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FIGURE 44. YER044G Protein Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharoxnyces cere vis iae chromosome V. 

Dietrich^ F. S., Mulligan, J.r Hennessy, K., Yelton, M. A. r Allen, E., 
Araujo, R. , Aviles, E. , Berno, A., Brennan, T., Carpenter, J., Chen, 

Cherry, J. M., Chung, E., Duncan, M., Guzman, E., Hartzell, G. , et al. 

YER044C Length: 148 March 26, 1999 16:40 Type: P Check: 3540 

1 MFSLQDVITT TKTTIAAMPK GYLPKWLLFI SIVSVFNSIQ TYVSGLELTR 

51 KVYERKPTET THLSARTFGT WTFISCVIRF YGAMYLNEPH IFELVEMSYM 

101 VALFHFGSEL LIFRTCKLGK GFMGPLWST TSLVWMYKQR EYYTGVAW 
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FX6UBE 45. Mouse EST with Similarxty to TER044c 



LOCUS 

DEFINITION 



ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



the 



FEATURES 

source 



cDNA 



two 



AI386195 455 bp znRNA EST 27-JAN-1999 

mq60h05.yl Soares 2NbMT Mus musculus cDNA clone IMAGE: 583161 5' 

similar to SW:YEN4_yEAST P40030 HYPOTHETICAL 17.1 KD PROTEIN IN 

SAH1-MEI4 INTERGENIC REGION. mRNA sequence. 

AI386195 

g4199658 

EST. 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chorda ta; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 455) 

Marra^M., Hillier,L.r Kucaba^T., Martin^J., BeckrC, Wylie^T., 

Underwood, K. , Steptoe,M., Theising^B., Allen, M., Bowers, Y., 

Person, B., Swaller,T., Gibbons, M., Pape,D., Harvey, N., Schurk,R. , 

Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M., McCann,R., 

Waterston,R. and Wilson, R. 

The WashU-NCI Mouse EST Project 1999 

Unpublished (1999) 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest6watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 357809 

This read is a RESEQUENCE of a previously sequenced mouse clone 
This read has been verified (found to hit its original self in 

correct orientation) 

Seq primer: -40RP from Gibco 

High quality sequence stop: 455. 

Location/Qualifiers 

1. .455 

/organism="Mus musculus" 
/strain="C57BL/6J" 

/note="Vector: pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2: Eco RI; 1st strand 

was primed with a Not I - oligo(dT) primer [5* 
TGTTACCAATCTGAAGTGGGAGCGGCCGCGTTTTTTTTTTTTTTTTTTTTTTTTT 
3'}; double-stranded cDNA was ligated to Eco- RI adaptors 
(Pharmacia) , digested with Not I and cloned into the Not 

and Eco RI sites of the modified pT7T3 vector. RNA 
provided by Dr. Bertrand Jordan. Library went through 

rounds of normalization, and was constructed by Bento 

Soares and M.Fatima Bonaldo." 

/db_xref=''taxon: 10090" 

/clone-"IMAGE: 583161" 

/clone_lib="Soares 2NbMT" 

/sex="male" 

/tissue_type= "Thymus" 

/dev_stage="4 weeks" 

/lab host="DH10B" 
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BASE COUNT 
ORIGIN 



94 a 



131 c 



112 g 



117 t 



1 others 



1 tgcggatgct gctgatactg ctgcagtagt actggatcgt caggcagagc gccctctctt 
61 ggaggggagt catgagccgc ttcctgaatg tgttacgaag ctggctggtt atggtgtcca 
121 ttatagccat ggggaacaca ctccagagct tccgagacca cacttttctc tacgagaagc 
181 tctacactgg caagccaaac cttgtgaatg gcctccaagc ccggaccttt gggatctgga 
241 cgctgctctc atcagtgatt cgctgcctct gtgccattga catccacaac aaaacactct 
301 atcacatcac actgtggaca ttcctcctcg ccctgngaca cttcctctca gagttgtttg 
361 tatttggaac agcagctccc acagttggtg tgctggcacc cctgatggta gcaagtttct 
421 caatcctggg catgctggtc gggctcccgt accta 



// 



FIGURE 45 (cont) . 
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1 



FIGURE 46. Human EST with Similarity to XER044g 

LOCUS W28235 839 bp mRNA EST 08-MAY-1996 

DEFINITION 43h8 Human retina cDNA randomly primed sublibrary Homo sapiens 

cDNA, mPNA sequence. 
ACCESSION W28235 
NID gl308183 
KEYWORDS EST . 
SOURCE human . 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 839) 

AUTHORS Macke, J., Smallwood, P. and Nathans, J. 
TITLE Adult Human Retina cDNA 
JOURNAL Unpublished (1996) 
COMMENT 

Contact: Dr. Jeremy Nathans 

Dr. Jeremy Nathans, Dept. of Molecular Biology and Genetics 

Johns Hopkins School of Medicine 

725 North Wolfe Street, Baltimore, MD 21205 

Tel: 410 955 4678 

Fax: 410 614 0827 

Email: jereray_nathanseqmail.bs.jhu.edu 
Clones from this library are NOT available. 
PCR PRimers 

FORWARD : CTTTTGAGCAAGTTCAGCCTGGTTAAGT 
BACKWARD: GAGGTGGCTTATGAGTATTTCTTCCAGGGTAA 
Seq primer: GGGTAAA7VAGCAAAAGAATT • 
FEATURES Location/Qualifiers 
source 1. . 839 

/organism="Homo sapiens" 

/note="Organ: eye; Vector: lambda gtlO; Site_l: EcoRI; 
Site_2: EcoRI; The library used for sequencing was a 
sublibrary derived from a human retina cDNA library. 
Inserts from retina cDNA library DNA were isolated, 
randomly primed, PCR ari^lified, size-selected, and 

cloned 

into lambda gtlO. Individual plaques were arrayed and 
used as templates for PCR amplification, and these PCR 
products were used for sequencing." 
/db_xref="taxon:9606" 

/ cl on e_lib= "Human retina cDNA randomly primed 

sublibrary" 

/sex="mixed (males and females)" 

/ tis sue_type- "retina " 

/dev_stage="adult" 

/lab_host="E. coli strain K802" 
BASE COUNT 127 a 141 c 136 g 140 t 295 others 

ORIGIN 

1 gnnnnnngnn 
61 aaacaagccc 
121 tggctggtta 
181 acttttctct 
241 cggacctttg 
301 attcacaaca 
361 ttcctctctg 
421 cctgatggtg 
481 agaaccagtt 
541 antgggttac 
601 nnnnnnnnnn 



nnnnnnnnnt tnttgagnac cgcagtngca gcagcagcag ccgctgncgc 
tcccacgttt gaggggagtc atgagccgtt tcctgaatgt gttaagaagt 
tggtgtccat catagccatg gggaacacgc tgcagagctt ccgagaccac 
atgaaaagct ctacactggc aagccaaacc ttgtgaatgg cctccaagct 
ggatctggac gctgctctca tcagtgattc gctgcctctg tgccattgac 
agacgctcta tcacatcaca ctctggacct tcctccttgc cctggggcat 
agttgtttgt cttatggaac tgcagctccc acgattggng tcctggcanc 
gnaagtttct ccatcctggg tattgtggtc ggctccngta ttttagaagt 
ccagacagaa gaagagaact gaggcagaat atcaacccca gggtggatca 
aagtggttna aaannnnnnn nnnnnnnnnc nnnntnntnt naannnnnnn 
nnnnnnnnna nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 
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661 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 
721 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 
781 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnc 



// 



FZ60RE 46 (cont) . 



52/88 



SUBSTITUTE SHEET (RULE 26) 



wo 00/58521 



PCT/USOO/08604 



FIGURE 47. Rat EST with Sizoilarity to ^ER044g 



LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 

MEDLINE 
COMMENT 



the 

normalized 
through 



FEATURES 

source 



allows 
within 



was 



AI172515 475 bp mRNA EST ll-FEB-1999 

UI-R-C2p-nu-d-02-0-UI.sl UI-R-C2p Rattus norvegicus cDNA clone 

UI-R-C2p-nu-d-02-0-UI 3', mRNA sequence. 

AI172515 

g3712555 

EST . 

Norway rat. 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus. 
1 (bases 1 to 475) 

Bonaldo,M.F., Lennon,G. and Soares^M.B. 

Normalization and subtraction: two approaches to facilitate gene 
discovery 

Genome Res. 6 (9), 791-806 (1996) 
97044477 

Contact: Soares, MB 

Program for Rat Gene Discovery and Mapping 
University of Iowa 

451 Eckstein Medical Research Building Iowa City, lA 52242, USA 
Tel: 319 335 8250 
Fax: 319 335 9565 

E^ail: msoare5@blue.weeg.uiowa.edu 

The sequence tag present in the cDNA between the NotI site and 

oligo-dT track served to identify it as a clone from the 

adult Placenta library. cDNA Library Preparation: M. Fatima 
Bonaldo, Ph.D. Clone distribution: clones will, be available 

Research Genetics 

Seq primer: Ml 3 Forward. 

Location/Qualifiers 

1..475 

/ organism="Rattus norvegicus " 
/strain="Sprague-Dawley" 

/note="Vector: pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2: Eco RI; The UI-R-C2p 
library is a subtracted library derived* from the UI-R-Cl 
library, which is a subtracted library derived from the 
UI-R-CO library. The UI-R-CO library consisted of a 
mixture of individually tagged normalized libraries 
constructed from rat placenta, adult lung, brain, liver, 
kidney, heart, spleen, ovary, muscle, 8, 12 and 18-day 
embryo. The tag is a string of 3-5 nucleotides present 
between the Not I site and the oligo-dT track which 

identification of the library of origin of a clone 

the mixture. The subtracted library (UI-R-C2p) was 
constructed as follows: PCR amplified cDNA inserts from 
UI-R-Cl clones from which 3* ESTs had been derived was 
used as a driver in a hybridization with the UI-R-Cl 
library in the form of single-stranded circles. The 
remaining single-stranded circles (s\ibtracted library) 

purified by hydroxyapatite column chromatography, 
converted to double- stranded circles and electroporated 
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into DHIOB bacteria (Life Technologies) to generate the 
UI-R-C2p library. This procedure has been previously 
described (Bonaldo, Lennon and Soares, Genome Research 

6: 

791-806, 1996)" 

/db xref="taxon: 10116" 

/clone="UI-R-C2p-nu-d-02-0-UI" 

/clone^lib-'^UI-R-cap" 

/de v_s t age= " adul t " 

/lab^host^^DHlOB (Life Technologies)" 
BASE COUNT 115 a 112 c 126 g 119 t 3 others 

ORIGIN 

1 tttttttttt tttttttctg tctggatact ggttctgctt ctaggtaccg gagcccaact 
61 agcataccca ggattgagaa acttgctacc atcaagggtg ccagcacacc aactgtggga 
121 gccgctgttc caaatacaaa caactccgag aggaagtgtc ccagggcaag gaggaatgtc 
181 cacagtgtga tgtgatagag tgttttgttg tggatgtcaa tggcacagag gcagcgaatc 
241 actgaagaga gcagcgtcca gatcccaaag gtccgggctt ggaggccatt cacaaggttt 
301 ggtttgccag tgtanagctt ttcatanaga aaagtgtggt ctcggaagct ctggagcgtg 
361 ttncccatgg ctatgatgga caccataacc agccagcttc gtagcacatt caggaagcgg 
421 ctcatgactc ccctcaaaga gagggcgctc tgcctgaccc tcgtgccgaa ttctt 

FIGXJRE 47 (cent) 
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FIGURE 48. YLRlOOw DMA Sequence 

Sequence contains BOObp of 5* promoter sequence. 

Symbols: 1 to: 1844 from: chrl2.gcg ck: 2436, 341011 to: 342854 

Chromosome XII Sequence 

Nature*387: 87-90 [97313267] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XII. 

Johnston, M. , Hillier, L., Riles, L., Albermann, K., Andre, B., 
Ansorge, W., Benes, V., Bruckner, M., Delius, H., Dubois, E., . . . 

gcgseq. trap. 10136 Length: 1844 March 26, 1999 15:19 Type: N Check: 2071 



1 ACGTACAAAA AAGAGCACGC TGCTTTATTT ATACTTTTGT GCCACAAGAA 

51 TGATCAACAT CAACATAAAT ATCAACTAGT ATCTGCAACA CATCTGCTCC 

101 ACGGAACTAA ACCCGTTGAG CAGTGCCCCG TGGAAACGTA AACTATCGCA 

151 AATTGGGATT AACAAGCCAA AAACAGCCAA GCAAGATTCA CGAAACCGCG 

201 CCTCGTTTGG ACCCCGAAGG CCCATTTAAC GGCCGGCCGT TACAAGCAAG 

251 ATCGGCAGAG CAAACCACTC CCCAGCACCA CAGCACATCA CTGCACGAGC 

301 AACAATAACT AGAACATGGC AGATAGCGAG GATACCTCTG TGATCCTGCA 

351 GGGCATCGAC ACAATCAACA GCGTGGAGGG CCTGGAAGAA GATGGTTACC 

401 TCAGCGACGA GGACACGTCA CTCAGCAACG AGCTCGCAGA TGCACAGCGT 

451 CAATGGGAAG AGTCGCTGCA ACAGTTGAAC AAGCTGCTCA ACTGGGTCCT 

501 GCTGCCCCTG CTGGGCAAGT ATATAGGTAG GAGAATGGCC AAGACTCTAT 

551 GGAGTAGGTT CATTGAACAC TTTGTATAAG TGTTTGTTGT TTATGTATCC 

601 GCATATAGCA GTTATAACAG ATAAATGGCA CTTTTCGCAC ACCCGTTGTT 

651 TTATCTCCGA TAGTACGTGG GCCTTTATTT ATGGTCGTTT AACGAAAGAA 

701 CGGCATCTTG AATTGAGCAG GTATTTAAAA GATAGGACGA GAAACAAGCA 

751 CATGATCTGT GTCGAAAAAA AGTAGCAAAG AGAAAAAGTA GGAGGATAGG 

801 ATGAACAGGA AAGTAGCTAT CGT/^CGGGT ACTAATAGTA ATCTTGGTCT 

851 GAACATTGTG TTCCGTCTGA TTGAAACTGA GGACACCAAT GTCAGATTGA 

901 CCATTGTGGT GACTTCTAGA ACGCTTCCTC GAGTGCAGGA GGTGATTAAC 

951 CAGATTAAAG ATTTTTACAA CAAATCAGGC CGTGTAGAGG ATTTGGAAAT 

1001 AGACTTTGAT TATCTGTTGG TGGACTTCAC CAACATGGTG AGTGTCTTGA 

1051 ACGCATATTA CGACATCAAC AAAAAGTACA GGGCGATAAA CTACCTTTTC 

1101 GTGAATGCTG CGCAAGGTAT CTTTGACGGT ATAGATTGGA TCGGAGCGGT 

1151 CAAGGAGGTT TTCACCAATC CATTGGAGGC AGTGACAAAT CCGACATACA 

1201 AGATACAACT GGTGGGCGTC AAGTCTAAAG ATGACATGGG GCTTATTTTC 

1251 CAGGCCAATG TGTTTGGTCC GTACTACTTT ATCAGTAAAA TTCTGCCTCA 

1301 ATTGACCAGG GGAAAGGCTT ATATTGTTTG GATTTCGAGT ATTATGTCCG 

1351 ATCCTAAGTA TCTTTCGTTG AACGATATTG AACTACTAAA GACAAATGCC 

1401 TCTTATGAGG GCTCCAAGCG TTTAGTTGAT TTACTGCATT TGGCCACCTA 

1451 CAAAGACTTG AAAAAGCTGG GCATAAATCA GTATGTAGTT CAACCGGGCA 

1501 TATTTACAAG CCATTCCTTC TCCGAATATT TGAATTTTTT CACCTATTTC 

1551 GGCATGCTAT GCTTGTTCTA TTTGGCCAGG CTGTTGGGGT CTCCATGGCA 

1601 CAATATTGAT GGTTATAAAG CTGCCAATGC CCCAGTATAC GTAACTAGAT 

1651 TGGCCAATCC AAACTTTGAG AAACAAGACG TAAAATACGG TTCTGCTACC 

1701 TCTAGGGATG GTATGCCATA TATCAAGACG CAGGAAATAG ACCCTACTGG 

1751 AATGTCTGAT GTCTTCGCTT ATATACAGAA GAAGAAACTG GAATGGGACG 

1801 AGAAACTGAA AGATCAAATT GTTGAAACTA GAACCCCCAT TTAA 
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FIGUBE 49. SfXZUOOv Protein Sequence 

Nature 387:87-90 (97313267) (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XII. 

•Johnston^ M, , Hillier, L., Riles, L., Albermann, K., Andre, B., 

Ansorge, W., Benes, V., Bruckner, M. , Delius, H., Dubois, E., 

Dusterhoft, A., Entian, K. D., Floeth, M., Goffeau, A., Hebling, U., et al 

YLRIOOW Length: 347 March 26, 1999 15:20 Type: P Check: 2853 

1 MNRKVAIVTG TNSNLGLNIV FRLIETEDTN VRLTIWTSR TLPRVQEVIN 

51 QIKDFYNKSG RVEDLEIDFD YLLVDFTNMV SVLNAYYDIN KKYRAINYLF 

101 VNAAQGIFDG IDWIGAVKEV FTNPLEAVTN PTYKIQLVGV KSKDDMGLIF 

151 QANVFGPYYF ISKILPQLTR GKAYIVWISS IMSDPKYLSL NDIELLKTNA 

201 SYEGSKRLVD LLHLATYKDL KKLGINQYW QPGIFTSHSF SEYLNFFTYF 

251 GMLCLFYLAR LLGSPWHNID GYKAANAPVY VTRLANPNFE KQDVKYGSAT 

301 SRDGMPYIKT QEIDPTGMSD VFAYIQKKKL EWDEKLKDQI VETRTPI 
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FIGURE 50. Hunan EST with Similarity to XXRlOOw 



LOCUS 

DEFINITION 
clone 

ACCESSION 
NID 

KEYWORDS 
. SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



(Pharmacia) 



RI; 



primer 



BASE COUNT 



R92053 454 bp mRNA EST 25-AUG-1995 

yp96c01.rl Scares fetal liver spleen INFLS Homo sapiens cDNA 

IMAGE: 195264 5'^ mRNA sequence. 

R92053 

g959593 

EST. 

hiiman. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 454) 

HillierrL.r Clar)c,N., Dubuque, T., Elliston,K., Hawkins, M., 
Holman,M., Hultman^M., Kucaba,T., Le,M., Lennon,G., Marra^M., 
Parsons, J., RifkinrL., Rohlfing,T., Scares, M., Tan,F., 
Trevaskis,E., Waterston, R. , Williamson, A. , Wohldmann,P. and 
Wilson, R. 

The WashU-Merck EST Project 
Unpublished (1995) 

Contact: Wilson RK 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email : est6watson. wustl • edu 

Insert Size: 1067 

High quality sequence stops: 337 

Source: IMAGE Consortium, LLNL 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
Insert Length: 1067 Std Error: 0.00 
Seq primer: M13RP1 
High quality sequence stop: 337. 
Location/Qualifiers 
1..454 

/organism="Homo sapiens" 

/note="Organ: Liver and Spleen; Vector: pT7T3D 
with a modified polylinker; Site_l: Pac I; Site_2: Eco 
1st strand cDNA was primed with a Pac I - oligo(dT) 

[5* AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3']/ 
double-Stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Pac I and cloned into the Pac 

and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization. Library 
constructed by Bento Scares and M.Fatima Bonaldo." 

/db_xref="GDB: 3764314" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 195264" 

/clone_lib="Soares fetal liver spleen INFLS" 
/sex="male" 

/dev_stage=*"20 week-post conception fetus" 
/lab_host="DH10B (aitpicillin resistant)" 
115 a 111 c 96 g 129 t 3 others 
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ORIGIN 

1 tttgagacca atgtctttgg ccattttatc ctgattcggg aactggagcc tctcctctgt 

61 cacagtgaca atccatctca gctcatctgg acatcatctc gcagtgcaag gaaatctaat 

121 ttcagcctcg aggacttcca gcacagcaaa ggcaaggaac cctacagctc ttccaaatat 

181 gccactgacc ttttgagtgt ggctttgaac aggaacttca accagcaggg tctctattcc 

241 aatgtggcct gtccaggtac agcattgacc aatttgacat atggaattct gcctccgttt 

301 atatggacgc tgttggatgc cggcaatatt gctacttcgc ttttttggca aatggcattc 

361 actttggaca ccatataatg ggaacaggaa gntatgggta tgggnttttc ccaccaaaag 
421 gctggaatcn tttcaatcct ctggatccaa atat 

// 



FIGURE 50 (cont) . 
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FI6UBE 51. Mouse EST with Similarxty to YLRlOOv 

LOCUS AI226514 1039 bp mRNA EST 29~OCT-1998 

DEFINITION uj07d08.yl Sugano mouse liver mlia Mus musculus cDNA clone 

IMAGE: 1891215 5* similar to TR:Q62904 Q62904 OVARIAN- SPECIFIC 

PROTEIN. mRNA sequence. 
ACCESSION AI226514 
NID g3809567 
KEYWORDS EST . 
SOURCE house mouse. 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 

Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 1039) 

AUTHORS Marra,M., Hillier^L., Allen^M., Bowles, M., Dietrich, N., 
Dubuque, T., 

Gels el, S., Kucaba,T., Lacy,M., Le,M., Martin, J., Morris, M., 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Scares, B., Wilson, R. and 
Waterston,R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortixim (info6image.llnl.gov) for further information. 
MGI: 975539 

Seq primer: custom primer used 
High quality sequence stop: 509. 
FEATURES Location/Qualifiers 
source 1. . 1039 

/organism- "Mus musculus" 
/strain=»"C57BL" 

/note="Organ: liver; Vector: pME18S-FL3; Site_l: Dralll 
(CACTGTGTG); Site_2 : Dralll (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double- stranded cDNA was 
ligated to a Dralll adaptor [TGTTGGCCTACTGG] , digested 
and cloned into distinct Dralll sites of the pME18S-FL3 
vector (5* site CACTGTGTG, 3' site CACCATGTG). Xhol 

should 

be used to isolate the cDNA insert. Size selection was 
performed to exclude fragments <1.5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom primers for 
sequencing: 5' end primer CTTCTGCTCTAAAAGCTGCG and 3' 

end 

primer CGACCTGCAGCTCGAGCACA. " 

/db_xref="taxon: 10090" 

/clone="IMAGE: 1891215" 

/clone_lib=" Sugano mouse liver mlia" 

/sex=" female" 

/dev_stage="adult" 

/lab_host="DH10B" 
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BASE COUNT 245 a. 267 c 231 g 272 t 4 others 

ORIGIN 

1 ggctaagaga accccggtgc agttctactt cggtgcaggg cgtggaagat gcggaaggtg 
61 gttttgatca ccggggcgag cagtggcatt gggctagccc tttgcggtcg actgctggca 
121 gaagacgatg acctccacct gtgtttggcg tgtaggaacc tgagcaaagc aagagctgtt 
181 cgagataccc tgctggcctc tcacccctcc gccgaagtca gcatcgtgca gatggatgtc 
241 agcagcctgc agtcggtggt ccggggtgca gaggaagtca agcaaaagtt tcaaagatta 
301 gactacttat atctgaatgc tggaatcctg cctaatccac aattcaacct caaggcatt-t 
361 ttctgcggca tcttttcaag aaatgtgatt catatgttca ccacagcgga aggaattttg 
421 acccagaatg actcggtcac tgccgacggg ttgcaggagg tgtttgaaac caatctcttt 
4 81 ggccacttta ttctgattcg ggaactggaa ccacttctct gccatgccga caacccctct 
541 cagctcatct ggacgtcctc tcgcaatgca aagaaggcta acttcagcct ggaggacata 
601 cagcacttca aaggcccgga accctacagc tctttccaat atgctaccga cctcctgaat 
661 gtggctntga acagggaatt caaaccagaa ggtctggtat tcagtggtga ttgccgaggg 
721 cgtctgatga ccaatatgac gtatggaaat ttgccttcct ttatcctgac cgtggttcta 
781 cccttaagtg ggctccttcg cttttttgaa aatgccctca cctgggaccc cgtaccactg 
841 atcaaaagct ctgggtgtgt ttctttcaca tataaccgga ggcttttatt ctttgaccaa 
901 atacgcgagc tccaccttgg tagtgggact atataccgac cggtcccacg aatgcactca 
961 tttaacacct tgtcaaaact ttttatagtt ttacctgttg tgataacgtg gtgntacccc 
1021 cttcgtantt gnaataccc 

// 

FI6UBE 51 <eont) . 



60/88 



SUBSTITUTE SHEET (RULE 26) 



wo 00/58521 PCT/USOO/08604 



FIGUBE 52. Mouse EST with Similarity to STXRIOOv 



LOCUS 

DEFINITION 



ACCESSION 
NID 

KEYWORDS 
SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



should 



end 



AI528381 637 bp xnRNA EST 18-MAR-1999 

ui96g06.yl Sugano mouse liver mlia Mus musculus cDNA clone 

IMAGE: 1890298 5' similar to TR:Q62904 Q62904 OVTUIIAN- SPECIFIC 

PROTEIN. mRNA sequence. 

AI528381 

g4442516 

EST. 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chorda ta; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
1 (bases 1 to 837) 

Marra,M., Hillier^L.r Kucaba,T., Martin, J., Beck,C., Wylie,T./ 

Underwood, K. , Steptoe,M., Theising^B., All en, M., Bowers, Y., 

Person, B., Swaller,T., Gibbons, M., Pape,D., Harvey,N . , . Schurk, R. , 

Ritter,E., Kohn,S., Shin,T., Jackson, Y,, Cardenas, M., McCann,R., 

Waterston,R. and Wilson, R. 

The WashU-NCI Mouse EST Project 1999 

Unpublished (1999) 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School o£ Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Elmail: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 974622 

Possible reversed clone: similarity on wrong strand 
Seq primer: custom primer used 
High quality sequence stop; 429. 

Loca tion/Quali f ier s 

1. .837 

/organism="Mus musculus" 
/strain="C57BL" 

/note="Organ: liver; Vector: pME18S-FL3; Site_l: Dralll 
(CACTGTGTG); Site_2 : Dralll (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double-stranded cDNA was 
ligated to a Dralll adaptor [TGTTGGCCTACTGG] , digested 
and cloned into distinct Drain sites of the pME18S-FL3 
vector (5' site CACTGTGTG, 3» site CACCATGTG). Xhol 

be used to isolate the cDNA insert. Size selection was 
performed to exclude fragments <l,5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom primers for 
sequencing: 5' end primer CTTCTGCTCTAAAAGCTGCG and 3' 

primer CGACCTGCAGCTCGAGCACA. " 

/db_xref="taxon: 10090" 

/clone="IMAGE: 1890298" 

/clone_lib=" Sugano mouse liver mlia" 

/sex="female" 

/dev_stage="adult" 

/lab host="DH10B" 
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BASE COUNT 
ORIGIN 



191 a 



222 c 



212 g 



208 t 



4 others 



1 ggctaagaga accccggtgc agttctactt cggtgcaggg cgtggaagat gcggaaggtg 

61 gttttgatca ccggggcgag cagtggcatt gggctagccc tttgcggtcg actgctggca 

121 gaagacgatg acctccacct gtgtttggcg tgtaggaacc tgagcaaagc aagagctgtt 

181 cgagataccc tgctggcctc tcacccctcc gccgaagtca gcatcgtgca gatggatgtc 

241 agcagcctgc agtcggtggt ccggggtgca gaggaagtca agcaaaagtt tcaaagatta 

301 gactacttat atctgaatgc tggaatcctg cctaatccac aattcaacct caaggcattt 

361 ttctgcggca tcttttcaag aaatgtgatt catatgttca ccacagcgga aggaattttg 

421 acccagaatg actcggtcac tgccgaccgg ttgcaggagg tgtttgaaac caatctctct 

481 tgccacttta ttctgattcg ggaactggaa ccacttctct tgcatgcgga caacccctct 

541 cagctcatct ggacgtcctc tcgcaatgca nagaaggcta acttcagcct ggaggacatn 

601 cagcactcca tagggcccgg accctacagc tctttccaat atgctaccga cctcctgaat 

661 gtggctttga acangaatnt caaccagaag ggtctgtatt ccagtcgcat gtgcccaggc 

721 gtcgtgatga ccaatatgac gtatggaatc ttgcctccct tttatctgga cgtgctccta 

781 cccatgatgg tgctccttcg cttctttggt aatgcgctta ctgggacacc gtacaac 



// 



FIGURE 52 (cont) . 
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PCT/US00/08(S04 



LOCUS 

DEFINITION 

ACCESSION 

PID 

DBSOURCE 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



14-JUL-1998 



JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



of 



Protein 
CDS 



3319971 334 aa 

17-beta-hydroxysteroid dehydrogenase type 7. 

3319971 
g3319971 

EMBL: locus MMY15733f accession Y15733 



house mouse* 
Mus musculus 

Eukaryota; Metazoa; Chordata; Vertebrata; Mainraalia; Eutheria; 
Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

1 (residues 1 to 334) 

NokelainenrP.r PeltoketOrH. , Vihko,R. and Vihko^P. 
Expression cloning of a novel estrogenic mouse 17 
beta-hydroxysteroid dehydrogenase/ 17-ketosteroid reductase 
(nil7HSD7), previously described as a prolactin receptor- 
associated protein (FRAP) in rat 
Mol. Endocrinol. 12 (7), 1048-1059 (1998) " 
98322544 

2 (residues 1 to 334) 
Nokelainen, P. A. 
Direct Submission 

Submitted (27-NOV-1997) P. A. Nokelainen, University of Oulu, 
Biocenter Oulu, WHO Collaborating Centre for Research on 
Reproductive Health Department of Clinical Chemistry^ Kajaanintie 
50, FIN-90220 Oulu, FINLAND 

Location/Qualifiers 

1..334 

/organism="Mus musculus" 
/strain="BALB/c" 
/db_xref="taxon: 10090" 
/tissue_type="mammary gland" 

/cell_type»"epitheliai cell derived from mammary gland 
a pregnant mouse" 

/clone_lib-"cDNA library prepared from poly (A) -enriched 
RNA isolated from HCll cell line" 
/clone=»ml7HSD7.1" 
/clone="ml7HSD7.2" 
1..334 ■ 

/product="17-beta-hydroxysteroid dehydrogenase type 7" 
1. .334 

/gene="HSD17B7 " 
/db_xref="SPTREMBL: 088736" 
/coded_by="Y15733 : 64 . . 1068" 



ORIGIN 



1 mrkwlitga 
61 qmdvsslqsv 
121 egiltqndsv 
181 lediqhskgp 
241 tlllpimwll 
301 kmdidedtae 



ssgiglalcg 
vrgaeevkqk 
tadglqevfe 
epyssskyat 
rf fvnaltvt 
kfyevllele 



rllaedddlh 
fqrldyiyln 
tnlfghfili 
dllnvalnrn 
pyngaealvw 
krvrttvqks 



Iclacrnlsk 
agilpnpqfn 
relepllcha 
fnqkglyssv 
Ifhqkpesln 
dhps 



aravrdtlla 
Ikaf fcgifs 
dnpsqliwts 
mcpgwmtnm 
pltkyasats 



shpsaevsiv 
rnvihmftta 
srnakkanfs 
tygilppfiw 
gfgtnyvtgq 



// 
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FIGURE 54. 'XER034W DNA Sequence 

Sequence contains 559bp of 5* promoter sequence. 

Symbols: 1 to: 1117 from: chrS.gcg ck: 9036, 221286 to: 222402 

Chromosome V Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, F. S,, Mulligan, J., Hennessy, K., Yelton, M. A., Allen, E., 
Araujo, R. , Aviles, E., Berno, A., Brennan, T., Carpenter, J., Chen, . . • 

gcgseq.tmp. 6597 Length: 1117 March 26, 1999 16:54 Type: N Check: 5026 . 

1 TGATGAAATA TTCCAGTTAT GCGTGTGCGT CTTGTGATGC AGATCCTTTT 

51 GGGCAAAAAC AGTTGGTTTG TGCGAAAACG CAAGGTAATA AATAGGCTTA 

101 AAGGAACTAA AAAAAAAAAA AGGAAAATAA CCAGCTAAGA TTTAAGGTAC 

151 AAGAAAGCGG TTGCACCTCA AGTAATGATA GTTATTAAAC CTTGGATTGG 

201 ACCAGATGTT TAAAATTGTT TTCAATAGTA GATTTGCAGT CGTAAATGCG 

251 TTCTCAGCAA TATCATATTG TGTTTATGAA GTATTACCAA ACGGGTAGAA 

301 GAACGGTTTA AGAGAATATG TCCGGATAAA GCGATCAGGA GAAAAGCTTA 

351 AAACCCAAAG TGGTCAATCT GCAGCCCATT TAGGCACTCT GCATTTAACC 

401 GATACCCGGA TTGAAGAAAG CTGGCGGGTG TATGGGTGAA GGAGAAGAAA 

451 GGAAGTGATT AGGAGAAACC TCATGGAGAT GAGCACATGC TACAACTAAT 

501 AACGTTATTC TACTTAAAAC GAGCAAAACA AAAAAAAAAA CAAGACAATT 

551 GAAAACGCAA TGGATGCATT CAGCTTAAAG AAGGATAATC GAAAAAAATT 

601 TCAAGATAAA CAGAAATTGA AAAGAAAACA TGCCACACCC AGTGATAGAA 

651 AGTACCGGCT ATTGAACCGC CAAAAAGAAG AGAAAGCTAC CACAGAGGAG 

701 AAAGATCAAG ACCAAGAACA GCCCGCCCTG AAGTCAAACG AGGACAGGTA 

751 CTATGAGGAC CCGGTACTCG AGGACCCGCA TTCTGCAGTC GCCAATGCAG 

801 AGTTGAACAA GGTGCTAAAA GACGTCCTCA AT^TCGGCT CCAGCAGAAC 

851 GACGACGCCA CAGCCGTCAA TAATGTTGCT AATAAAGATA CTTTG7VAAAT 

901 CAAAGACCTC AAGCAGATGA ATACGGATGA GCTCAATCGT TGGCTCGGAC 

951 GGCAGAATAC AACATCGGCT ATAACAGCGG CTGAGCCCGA ATCATTAGTC 

1001 GTTCCCATTC ACGTACAAGG TGATCATGAT CGTGCGGGCA AGAAGATCAG 

1051 TGCCCCTTCG ACCGATCTAC CGGAAGAACT . AGAGACCGAT CAGGATTTCC 

1101 TTGATGGACT GCTCTAA 
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FIGURE 55. rSHOJ^v Protein Sequence 

. Nature 387:78-81 (97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, F. S., Mulligan, J., Hennessy, K. , Yelton, M. A,, Allen, E., 

Araujo, R. , Aviles, E,, Berno, A., Brennan, T., Carpenter, J., Chen, 

E., Cherry, J. M., Chung, £., Duncan, M., Guzman, E., Hartzell, G., et al 

YER034W Length: 185 March 26, 1999 16:55 Type: P Check: 3501 
1 MDAFSLKKDN RKKFQDKQKL KRKHATPSDR KYRLLNRQKE EKATTEEKDQ 
51 DQEQPALKSN EDRYYEDPVL EDPHSAVANA ELNKVLKDVL KNRLQQNDDA 
101 TAVNNVANKD TLKIKDLKQM NTDELNRWLG RQNTTSAITA AEPESLWPI 
151 HVQGDHDRAG KKISAPSTDL PEELETDQDF LDGLL 
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FXGUBE 56. YKL077V DVOi Seqaenee 

Sequence contains 1200bp of 5' promoter sequence. 

Syinbols: 1 to: 2379 from: chrll.gcg ck: 9298, 289895 to: 292273 

Chromosome XI Sequence 

Nature 387:98-102 [973132701 (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV, 

Dujon, Albermann, K., Aldea, M. , Alexandraki, D., Ansorge, W., 
Arino, J., Benes, V., Bohn, Bolotin-Fukuhara, M., Bordonne, R. , . . . 

gcgseq.tnq;>.4920 Length: 2379 March 26, 1999 16:48 Type: N Check: 4118 • 

1 GAAAGGAAGC TATAGTAATG GGGCTTCAGG AACrTTTATGA ATTGGGTGCT 

51 CTTGACACTC GTGGAAAGAT AACTAAACGG GGTCAACAAA TGGCTCTGTT 

101 ACCGCTACAA CCGCATTTAA GTAGTGTCTT AATTAAAGCC AGTGAAGTCG 

151 GATGTTTGAG TCAGGTCATT GATATCGTCT CTTGCCTTAG TGTGGAAAAT 

201 TTACTGTTGA ATCCGTCACC AGAAGAAAGA GATGAGGTGA ACGAGCGTCG 

251 TTTGTCCTTA TGCAACGCTG GTAAAAGGTA TGGTGACCTT ATCATGCTGA 

301 AAGAGCTTTT TGATATCTAT TTCTACGAAC TAGGGAAAAG TCAAGATGCA 

351 AGCTCTGAAA GAAATGATTG GTGTAAAGGA TTGTGTATTT CGATACGTGG 

401 GTTTAAAAAT GTAATTCGTG TCAGAGACCA GTTAAGAGTT TATTGTAAGC 

451 GTTTGTTTTC TTCAATCAGT GAAGAGGATG AAGAATCCAA AAAGATTGGT 

501 GAAGATGGCG AGCTAATTTC GAAAATTTTA AAGTGTTTCT TAACTGGGTT 

551 TATCAAGAAT ACAGCTATAG GGATGCCAGA CAGGTCTTAT AGAACTGTTT 

601 CCACTGGAGA GCCGATAAGC ATTCATCCAT CATCTATGCT ATTTATGAAT 

651 . AAAAGCTGCC CCGGTATAAT GTACACGGAG TATGTCTTTA CTACGAAGGG 

701 ATATGCCAGA AATGTTAGTA GGATTGAACT TTCATGGTTA CAAGAAGTTG 

751 TCACTAATGC AGCCGCTGTA GCAAAGCAAA AAGTTTCTGA TTCAAAATAA 

801 GTCACCTACT CTTAGCGCAT TTTTATTGTA TATAAAGGCA TTTAATGTAA 

851 TTTATAGAGC ATTATAAATC GTAACAACTA CTGCAGTATG AGTTTCATGG 

901 ATTCATTTCT CAATATCTTA TGAATATACA CAGGTATATA TGTATATTCA 

951 TGTTAAACGC CTTTCGAATT GTTCGTTGGC TTTTTTTGTG AAATTATCTC 

1001 GGGAAAAGGG CGAAATTATA TTATTTTGCC GTTGACATTT TGAAAAGGAA 

1051 TAAAAGATCA TGAAAAAAAT AAGAAAGGCA ATTCGACGCA TTTCTCTCAG 

1101 CAAGCTATTC TTTACTTTTG AAGAACAAAA TATTTTAGCA AAAAGGTTAA 

1151 GACAATATAG TCGGAAGCAG TTCTGCGGGA TCTGAAGGAA TTGCGGAATA 

1201 ATGAGATTTC ACGATAGTAT ACTTATCTTC TTTTCTTTGG CATCGCTTTA 

1251 TCAACATGTT CATGGTGCAA GACAAGTCGT TCGTCCAAAG GAGAAAATGA 

1301 CTACTTCAGA AGAAGTTAAA CCTTGGTTAC GTACGGTTTA TGGAAGTCAA 

1351 AAAGAATTAG TCACTCCTAC GGTCATTGCC GGTGTCACTT TTTCTGAT^ 

1401 ACCAGAAGAA ACACCAAATC CATTGAAACC TTGGGTATCT TTAGAGCATG 

1451 ATGGTAG6CC AAAAACCATT AAACCAGAAA TCAACAAAGG TCGAACCAAG 

1501 AAGGGAAGAC CTGATTACTC AACTTACTTC AAAACGGTAA GTTCCCACAC 

1551 ATATTCTTAT GAAGAATTGA AGGCTCACAA TATGGGCCCT AATGAAGTTT 

1601 TTGTAGAAGA AGAGTATATT GATGAAGATG ACACCTACGT CTCCCTGAAT 

1651 CCTATTGTCA GATGTACTCC TAATCTTTAC TTCAATAAAG GTCTAGCAAA 

1701 GGATATCCGC AGTGAGCCAT TTTGTACCCC TTATGAGAAT TCTAGATGGA 

1751 AGGTTGACAA GACTTACTTC GTTACTTGGT ATACAAGATT TTTTACAGAT 

1801 GAGAATTCCG GTAAAGTTGC TGATAAGGTT CGTGTTCATT TGTCCTATGT 

1851 TAAAGAAAAC CCCGTAGAGA AGGGCAATTA TAAAAGAGAT ATCCCTGCAA 

1901 CTTTTTTCTC TTCCGAATGG ATTGATAATG ACAACGGTCT AATGCCGGTT 

1951 GAGGTCAGAG ATGAATGGCT GCAGGACCAA TTTGATCGTA GGATCGTTGT 

2001 ATCAGTTCAG CCAATATACA TATCAGATGA AGATTTCGAT CCACTACAAT 

2051 ACGGTATTTT ATTATACATC ACTAAGGGTT CAAAAGTGTT TAAGCCTACT 

2101 AAGGAGCAAC TGGCTTTAGA CGATGCAGGT ATAACAAATG ATCAGTGGTA 

2151 TTATGTTGCA TTATCTATCC CTACTGTCGT GGTGGTATTT TTCGTCTTCA 

2201 TGTACTTTTT CTTATATGTC AACGGGAAAA ACAGAGATTT CACAGATGTT 

2251 ACTAGAAAAG CTTTAAACAA GAAACGCCGT GTTTTGGGTA AGTTCTCGGA 

2301 GATGAAGAAA TTCAAAAACA TGAAAAATCA CAAGTACACC GAATTGCCAT 

2351 CTTATAAGAA AACCAGTAAA CAAAATTAG 
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FIGUBE 57. X2CL077V Prptein Sequence 

Nature 38T: 98-102 [97313270] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV. 

Dujon, B., 7U.bermann, K., Aldea^ M. , Alexandraki, D., Ansorge, W., 
Arino, J., Benes, V., Bohn, C, Bolotin-Fukuhara, M., Bordonne, R. ^ 
Boyer, J., Camasses^ A., Casamayor, A.^ Casas, C, Cheret, G., et al. 

YKL077W Length: 392 March 26, 1999 16:50 Type: P Check: 1732 . 

1 MRFHDSILIF FSLASLYQHV HGARQWRPK EKMTTSEEVK PWLRTVYGSQ 

51 KELVTPTVIA GVTFSEKPEE TPNPLKPWVS LEHDGRPKTI KPEINKGRTK 

101 KGRPDYSTYF KTVSSHTYSY EELKAHNMGP NEVFVEEEYI DEDDTYVSLN 

151 PIVRCTPNLY FNKGLAKDIR SEPFCTPYEN SRWKVDKTYF VTWYTRFFTD 

201 ENSGKVADKV RVHLSYVKEN PVEKGNYKRD IPATFFSSEW IDNDNGLMPV 

251 EVRDEWLQDQ FDRRIWSVQ PIYISDEDFD PLQYGILLYI TKGSKVFKPT 

301 KEQLALDDAG ITNDQWYYVA LSIPTWWF FVFMYFFLYV NGKNRDFTDV 

351 TRKALNKKRR VLGKFSEMKK FKNMKNHKYT ELPSYKKTSK QN 
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FI6URB 58 . YGR046v DMA Sequence 



ck: 9962, 584290 to: 586046 



Sequence contains 599bp of 5' promoter sequence 
Symbols: 1 to: 1757 from: chr7.gcg 

Chromosome VII Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome VTI. 

Tettelin, H./ Agostoni Carbone, M. L. , Albermannr K., Albers, M., 
Arroyo, J., Backes, U., Barreiros, T,, Bertani, I./ Bjourson, A. J., • . 

gcgseq.tmp.228 Length: 1757 March 26, 1999 16:44 Type: N Check: 9449 



1 


TCTCACTCCG 


51 


GAAGAAAAAA 


101 


TCATAATGCT 


151 


GAAAGCACTT 


201 


AGGCGCTTGT 


251 


TCGCCAATGC 


301 


TGTTATCTTA 


351 


ATTATTTGCC 


401 


TJ^TAAAGAAA 


451 


AGTAAAGAGT 


501 


AACCTTCAAA 


551 


TAATTTGAAT 


601 


TGTTACGAGT 


651 


ACGAACGTAA 


701 


GAGAAGTTCC 


751 


AAAGGCCGAA 


801 


CTCTTGGAAA 


851 


AAATTATATG 


901 


AGCTCATTAC 


951 


TCATCCTTCA 


1001 


TTTCGAACAA 


1051 


TAATCTTGGG 


1101 


CAGAATCCGC 


1151 


GTCTAAATTT 


1201 


ATATAAATGG 


1251 


TTAAAGGACA 


1301 


AAAGCCTGTA 


1351 


AATTAAACTT 


1401 


AAAAATAACA 


1451 


CTTAAGTTAT 


1501 


ACAAAGTTAA 


1551 


TACAAGCCGA 


1601 


AAAAGGGTTC 


1651 


GTAAATCAAG 


1701 


ACAAAGTCAA 


1751 


AAGCTAG 



GCGGCCATTT 

GATATGCCGC 

ACTCGTTTAC 

TTTGCATTTA 

GATTTTGAAT 

TGTACCAGAC 

GTTTTTCACT 

CCCACATCAT 

AGAAAAGAAA 

AGATGTTTCG 

ACAATTAAAC 

TAATAGGAGC 

TTCTGAAAAT 

GCATGTTTAA 

ATAGATGATG 

TCACTACATC 

AAGGTATAAG 

TACAAGTTTC 

TATCGATAAG 

AAGCTCCGTG 

GCGGGATATT 

CGTCACATAT 

AACATTATTC 

CAACAGATCG 

ACACGACGTA 

TAGCTACTTG 

AAAATATTGA 

AAAAGCTGCA 

ATAAGTTTGA 

GCAGGTGATA 

CAACATTGTT 

TTTACAAAGA 

ACATTAAAGA 

TGCATTACAA 

TTAAGTATGC 



TACGTGACGA 

TTTGCGGTTT 

CCACTATCCC 

CACATCGTAG 

TTAAGAAATG 

TCTCTATAGC 

TACCAGTAGC 

AGGTCAAGTG 

TCATACCCTT 

ACGGACTAAA 

TTGAGAAACG 

TGCTTTTTAC 

GGTCTACGGT 

TAGGCTTCTG 

CTGGCATTAT 

GAGGGAATTA 

AAAAACTGAC 

ACAGATTGCC 

GAACTTCAAA 

CCGGTTTGTA 

CCA7\AAGTCA 

CCATCACATT 

AAGTTTGAAA 

GCGCAGGCGT 

AAATATGGGG 

GAATACATTC 

AGAATGATTT 

GCTACTTTGG 

CGAATTCCAA 

TTAGATACAA 

ACCAAAAACT 

AGTGGTCCTA 

ATACTCAGAG 

ACTATTAAAG 

TTGGGCCAAA 



AGCATCCCTT 

CTTTCTGGCA 

TGTCCAAACT 

ATTATAAAAT 

TGGACTAGAG 

ATCTAAACAC 

GCGCTTGTTA 

ACCTTCTCTT 

CAGCCTGTTT 

TAATGTGAAA 

TTGCTATAGG 

TTTGATATAT 

TTCTGCTGAA 

AGTACTCAAA 

CCCCGATGGA 

CTAAAGGCAG 

GAAATGACTT 

CCCCAACTAT 

AGGAACTGGA 

TTTGGTTACG 

TAGCAA^CCT 

TTCACTCTAT 

TACTTCGGTT 

ATATTTTAAT 

TGGTTTCTAT 

TATTTAGCAG 

GAGAGTGCAA 

CCAAACATTA 

TTTTACAAGG 

ACTGGGTGGA 

TTGAAAGATT 

AATGATTCAT 

ACTTTTGCTC 

GTGTTTTCAC 

AAACTAAAAT 



ACAACAGAAA 

ATGTATGCAC 

AAAGAGGGAG 

GATCGTTAAC 

AAGTCTTAAA 

GAAATTCAAC 

TTCCCACGTT 

ACCCGACATG 

AGCCATAAAT 

AAGGTTCTAA 

ATTGAGCTAA 

CCTGAAGTTA 

ATGCCATTCA 

TAAAGGAGGG 

ACTATTAACG 

TGATCTGGAC 

CCAATTTTAC 

GGAAGTAACC 

TGGGGTAATG 

GCTCAGGAGT 

CAAATCGATA 

TAATATGAGG 

CCGAGTTCGT 

CCATTTGCAA 

GGAAACACTT 

GACGACTACA 

TATTGGAACC 

CACCTTAGAG 

AGATCACTGC 

GAAAATCCCG 

TCAAGAGTAT 

TTTATCTTCC 

AGCCGTATTA 

AGCTGGAATC 

CGATGAGGAG 
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FZGUZ^ 59. YGR046W Protein Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome VII. 

Tettelin, H., Agostoni Carbone, M. L., Albermann, K., Albers, M., 
Arroyo, J., Backes, U., Barreiros, T., Bertani, I., Bjourson, A. J,, 
Bruckner, M. , Bruschi, C. V., Carignani, G., Castagnoli, L,, Cerdan, et 

yGR046W Length: 385 March 26, 1999 16:46 Type: P Check: 4137 

1 MLRVSENGLR FLLKCHSTNV. SMFNRLLSTQ IKEGRSSIDD AGIIPDGTIN 

51 ERPNHYIEGI TKGSDLDLLE KGIRKTDEMT SNFTNYMYKF HRLPPNYGSN 

101 QLITIDKELQ KELDGVMSSF KAPCRFVFGY GSGVFEQAGY SKSHSKPQID 

151 IILGVTYPSH FHSINMRQNP QHYSSLKYFG SEFVSKFQQI GAGVYFNPFA 

201 NINGHDVKYG WSMETLLKD. lATWNTFYLA GRLQKPVKIL KNDLRVQYWN 

251 QLNLKAAATL AKHYTLEKNN NKFDEFQFYK EITALSYAGD IRYKLGGENP 

t 

301 DKVNNIVTKN FERFQEYYKP lYKEWLNDS FYLPKGFTLK NTQRLLLSRI 
351 SKSSALQTIK GVFTAGITKS IKYAWAKKLK SMRRS 
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FIGUBE 60. yJH041c DMA Sequence 

This sequence includes lOOObp of 5' promoter sequence. 

Symbols: 1 to: 4525 from: chrlO.gcg /rev ck: 4711, 509927 to: 514451 

Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Coxi?)lete nucleotide sequence of 
Saccharomyces cerevisiae chromosome X. 

Galibert> F. , Alexandraki, D., Baur, A., Boles, E., Chalwatzis, N., 
Chuat, J. C, Coster, F., Cziepluch, C, De Haan, M., Domdey, H., . . . 

gcgseq.ticp. 25123 Length: 4525 March 26, 199? 11:33 Type: N Check: 4481 



1 TACCTGCTGT 

51 TCTGGTTCAC 

101 CTCTGATCTG 

151 TCCTCAGACA 

201 TTCACTCTTC 

251 ATTCAGAATC 

301 CGAAATTAGA 

351 TCACCTAAAT 

401 CTCTTTGCTG 

451 TATCTTTTGC 

501 TTGAACTTTA 

551 TACAGTCATA 

601 GTTGTTCTTC 

651 TCCAAAAAAT 

701 GATTGTCATA 

751 CAAGACCGTA 

801 TGACTTCCAT 

851 TTCACTTAGC 

901 TACAGAATTG 

951 CCATCAATTA 

1001 ATGGGTGATC 

1051 GTCGAAATTA 

1101 TAGTTTCAAA 

1151 TTCGTCTTAG 

1201 TTTTAAGACC 

1251 CTATTAATGA 

1301 GTGCCTGTAA 

1351 TACACGTAGC 

1401 TGATTGTCAA 

14 5 i ATAAATGGAT 

1501 GCAAGATGCC 

1551 ACAAAATTGC 

1601 TATTTCGCAC 

1651 TCTATCTCAT 

1701 TGAAATTAAA 

1751 GATATGGCTT 

1801 CAATTTTGCT 

1851 CTAGTTTAGA 

1901 GTATCTCAAG 

1951 TGATGAATCT 

2001 TCGAGGTTGC 

2051 GAAAATTTGA 

2101 CCAATCACAC 

2151 ATGAGTACTG 

2201 CCTGCATATG 

2251 ACAATGGAAA 

2301 CCACCAACAG 



AGAATCCTTC 
CGTCTGATCT 
TTCTCCTCTA 
GTTTAAAACG 
CATCCTGATC 
GCCTCCATGG 
ATTAACAACT 
CACGGTAAAT 
GTATCTAATC 
AGTTATGAAT 
AAGGCATCTT 
GGGACCAGGA 
TTCATCGGTC 
CAAATTGATC 
TCTGATAATT 
AGTTCAATGT 
CCAACTAAAA 
TCATCTCAAA 
TTTACTAGCA 
TCTTTGGAAA 
TTACAGAAGA 
CTACGTTCGA 
ATTTGATAAA 
ATTTACTCAT 
AGTGAACATA 
TCCAATTTCG 
TGATAAGAAC 
GTTTCGTTTA 
TTTTTCTGTT 
TAAGTTCGTG 
TGCAATCTAA 
TACTTGTTAC 
AGACCAAAAA 
TTTATGGGAA 
TAAAAAGTTT 
ATTATTATTT 
CAACTAGAAG 
ATGCAGATTT 
AGTTCCTTGA 
GGAGTGTTAT 
TATTAAACAT 
ACGATCCTCT 
GCTAACGCAA 
TTCCAGAAAA 
TCAAGTCTAT 
AATCTATTGC 
GGTTCCTTTA 



ACTGAAAACA 

ATTAATCCAG 

CATCCTGACC 

GTTAAAGATT 

CTTGACTCTA 

CCAGATTTAC 

CCAATCGTTG 

TTCAAATAAA 

TAGGAATTCT 

GCCATATTTT 

GTCGCCATTT 

TAGCCCCGCT 

TTGTTATTAC 

GACGTCCATA 

GCGTTCTGGC 

TTTCTATACA 

AACCTCTCCG 

ATGATCGCTA 

TAGGAACATC 

AACAGAGAGT 

ACTATCTATC 

CGAGCACAAA 

TTAGAAACCT 

TGATAGGCTC 

CTTGGATAAT 

ATAAAAAAAC 

ATTTTTCCTT 

TAAAAGCATT 

GAAGAGTCTT 

CCCGACGACT 

CTCATGTTGA 

TGCAAGCATA 

TTCTGCATCT 

AGTTCCTTTT 

GTCCAAGAGA 

TGCCACTTTC 

TCATCTTTAC 

CTGAATCTTT 

AGCATTATTG 

CATTAATACC 

ATTTTTCGGT 

CTTTTCCTCT 

GGGAATTATC 

GGACCCGATT 

AACGAAGCAA 

AAGCTTTACT 

TATTTAATAC 



CTTGTTCAAT 
TTTAGCAATG 
ATCTAATATG 
CTTCCAACTC 
CCAATAAACA 
TGTTGCATTA 
GTACATTAAA 
CCTGATACGT 
AACAGGATAA 
GGTAAGAAAG 
TTTTCAATCG 
GACTGGGTCC 
TAAGTTGCGC 
AGTAATCGAT 
TCACGCTTAT 
ACTACAATTT 
TCGTGCGCGA 
AGAGGGCACT 
TCTGTCTAAG 
ATACTGCACT 
CCAGACAATG 
ACCCCATCAA 
ACTTTCCAAA 
AACAATGGAA 
TTTCACGAGA 
TACTCAAAAA 
TGGCCTAAAG 
TTTTGCGATT 
TTCAACTTTT 
GACTTTGCGC 
CAATATTACT 
TGCTACTACC 
TCAAACCAAT 
ACAACCACGC 
ATGCGTCCGA 
GTCACTTTCT 
AATTTTAGGT 
TATCGGAATC 
CTTGAAATGT 
AATTATCCTT 
TACTTGAATT 
CATATTTGGG 
AGATTTTTTT 
CCTATTTTTT 
TTGTTCACTT 
TGACCAAGTC 
GCATATGCTT 



ATATTCTTCA 

ACTCAATAAA 

AAGTACATTG 

ATAAAATCGG 

CTTCCAATTC 

TGCTCCTTCG 

CACTCTGTCA 

ATGCAGAAAA 

AGCTTATATT 

TGGCCCCAGC 

GTTGATCATT 

CTTTTATATA 

CGTTCCGTCG 

TTGAATCATC 

TGACTCAACT 

GTACAAGGCT 

TCTGAAAAAT 

TGGTCACAAC 

ATTTAGCTTG 

TTTTGATAAT 

CCCAAGATTT 

ATTGCCGAGA 

AAAAGAAATT 

ATTTGGATGA 

TTATTAGATG 

ATTGAAGACT 

ACAAATTACT 

AATGACTACT 

AGAACATGCC 

TTTCATAdTT 

ACAACGGATA 

AAGTTTAAGA 

CCTTCATTCG 

ATAGATTACA 

AATTACCGAC 

TATCAAAAGA 

GCCAAGAAAC 

GAAGAAAACC 

TAGCGTCGAC 

AAATTGGATA 

GATTCAGCTC 

ATTTAATAAT 

GCCAAAATAA 

GATAAATCAT 

TATCTTCTTT 

AATCACGATT 

GGAGGGACTA 
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2351 TCAGAGGGCG CATCGCGCGC AACTCTCGAT GAGGTAAAGC CTATTTTATC 

2401 TCAAGTATTT ACTTTGGAAT CATTTAATAA CAGTCTTCAA TGGGACCTAA 

2451 AGTATCATAT AATGGAAGTC TACGATGATA TTGTCCCTGC AGAGGAACTA 

2501 GAAAAAATCG ATTACGTGTT ATCTTCTAAT ATTTTTGATA CTACATCGGC 

2551 TGATGTTGAA GAACTGTTCT TTTATTGCTT CAAATTGAGA GAATATATTT 

2601 CGTTCGATCT TTCTGATGCA AAAAAAAAAT TCATGAGGCA CTTTGAAATC 

2651 CTTGACGAAG AAAGAAAGTC AAACTTATCA TACTCTGTTG TGTCCAAATT 

2701 TGCAACATTA GTAAACAACA ACTTTACAAG AGAACAAATT TCTTCTTTAA 

2751 TTGATTCATT ACTATTGAAC TCGACAAATT TATCTTCGTT ATTAAAAAAT 

2801 GATGACATTT TTGAGGAGAC AAATATCACG TACGCTTTAA TAAACAAGCT 

2851 TGCTTCATCA TACCATCAAA CCTTCGCTCT AGAAGCTTTG ATTCAAATTC 

2901 CTATCCAATG CATCAACAAA AACGTTAGAG TGGCTCTCAT T7ACAATCTA 

2951 ACATGCGAAT CATTTTGCCT TGATTCCGCT ACTAGAGAAT GCCTCCTTCA 

3001 TTTATTGTCA AGCCCGACCT TCAAGAGCAA CATTGAAACA AATTTCTACG 

3051 AATTATGTGA GAAAACAATA ATGAGCCCCG AAATGGCCAT TTCAGAGACA 

3101 GGTGATGAAA AAAAGGAAAT AGAAGACAAA ATATCTATTT TCG7\AAAAGT 

3151 TTGGACTAAT CATCTGTCAC AGGCAAAGGA GCCTGTGAGT GAGAAGTTCT 

3201 TAGAATCTGG TTACGATATC GTTAAACAGT CAATGTCATT GTCCAATGGT 

3251 GATAGCAAAC TAATTATCGC CGGGTTTACT ATCGCAAAAT T.TTTGAAACC 

3301 AGATAACAAG CATAGAGATA TACAAGGTAT GGCAATTAGC TATGCTGTTA 

3351 AAATTTTGGA AAACTACTCT GAAAATTTTG AATCTGAAAC AATTCCCCTT 

3401 TTCAGAATAT CAATGTCTAC ATTGTACAAG ATTATAACGA CCGGACAAGG 

3451 CGATATTTCT AAGCATAAAT CGAGAATTCT GGATATATTT TCCAAAATTA 

3501 TGCTTCGATA TCATTCTAAA AAAGTGTACC ATGCGCCAGA AGAACAGGAA 

3551 ATGTTTTTGG TTCATTCACT CCTTACAGAA AACAAGTTGG AGTATATTTT 

3601 TGCAGAGTAC TTAAATATTG AGCATACAGA TAAGTGCGAT TCTGCCTTGG 

3651 GGTTCTGCTT GGAAGAAAGT CTTAAACAAG GTCCTGATGC GTTTAACCGC 

3701 CTGCTCTGGA ACAGTGCTAA ATCGTTTTCC ACCATTAGCC AACCTTGTGC 

3751 TGAAAAATTT GTGAGAGTTT TTATCATAAT GTCAAAAAGG ATTGCAAGAG 

3801 ACAATAACCT TGGTCATCAC CTATTTGTGA TAGCTTTACT TGAAGCCTAC 

3851 ACCTATTGTG ATATAGAAAA ATTTGGCTAC AAGTCATACT TGCTACTGTT 

3901 CAATGCTATC AAGGAGTTCT TAGTATCGAA ACCATGGCTA TTCAGCCAAT 

3951 ACTGTATTGA AATGCTGCTT CCTTTCTGTT TAAAAACTCT CGCTTTTATA 

4001 GTAAACCATG AGTCAACGGA TGAAATCAAT GAAGGCTTTA TTAACATCAT 

4051 CGAAGTGATA GATCATATGC TATTAGTTCA CAGGTTTAAA TTTTCCAATC 

4101 GTCACCATTT GTTTAACTCC GTTCTTTGCC AGATACTAGA AATAATAGCA 

4151 ATTCATGATG GTACATTGTG TGCAAATTCA GCAGACGCCG TAGCCAGACT 

4201 AATAACGAAC TACTGCGAGC CTTATAATGT ATCAAACGCT CAAAATGGGC 

4251 AGAAAAATAA CTTAAGCTCA AAGATAAGTT TGATAAAGCA GTCCATCAGA 

4301 AAAAATGTAC TTGTGGTTCT AACGAAATAT ATACAGTTGT CTATTACGAC 

4351 GCAGTTCAGT TTAAACATAA AAAAGAGTCT GCAGCCCGGT ATTCATGCGA 

4401 TTTTTGATAT ATTATCTCAG AACGAGTTGA ATCAATTGAA CGCTTTCCTT 

4451 GACACACCTG GGAAACAATA TTTCAAAGCA CTTTACCTCC AATACAAAAA 

4501 GGTTGGTAAA TGGCGCGAAG ATTAA 



FI6X7BE 60 (cont) . 
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FI6DKB 61. YJR041G Protein Sequence 



50 J. 15:2031-2049 18641269] (1996) Complete nucleotide sequence of 
Saccharomyces. cerevisiae chroznosome X. 

Galibert, F., Alexandraki, D., Baur, A., Boles, E., Chalwatzis, N., 

Chuat, J. Coster, F., Cziepluch, C, De Haan, M. , Domdey, H., 

Durand, P., Entian, K, D., Gatius, M., Goffeau, h, , Grivell, L. A,, et al 



YJR041C Length: 1174 March 26, 1999 11:35 Type: P Check: 5083 

1 MGDLTEELSI PDNAQDLSKL LRSTSTKPHQ lAEIVSKFDK LETYFPKKEI 

51 FVLDLLIDRL NNGNLDDFKT SEHTWIIFTR LLDAINDPIS IKKLLKKLKT 

101 VPYMIRTFFL WPKDKLLTRS VSFIKAFFAI NDYLIVNFSV EESFQLLEHA 

151 INGLSSCPTT DFALSYLQDA CNLTHVDNIT TTDNKIATCY CKHMLLPSLR 

201 YFAQTKNSAS SNQSFIRLSH FMGKFLLQPR IDYMKLNKKF YQENASEITD 

251 DMAYYYFATF VTFLSKDNFA QLEVIFTILG AKKPSLECRF LNLLSESKKT 

301 VSQEFLEALL LEMLASTDES GVLSLIPIIL KLDIEVAIKH IFRLLELIQL 

351 ENLNDPLFSS HIWDLIIQSH ANARELSDFF AKINEYCSRK GPDSYFLINH 

401 PAYVKSITKQ LFTLSSLQWK NLLQALLDQV NHDSTNRVPL YLIRICLEGL 

451 SEGASRATLD EVKPILSQVF TLESFNNSLQ WDLKYHIMEV YDDIVPAEEL 

501 EKIDYVLSSN IFDTTSADVE ELFFYCFKLR EYISFDLSDA KKKHMRHFEI 

551 LDEERKSNLS YSWSKFATL VNNNFTREQI SSLIDSLLLN STNLSSLLKN 

601 DDIFEETNIT YALINKLASS YHQTFALEAL IQIPIQCINK NVRVALINNL 

651 TCESFCLDSA TRECLLHLLS SPTFKSNIET NFYELCEKTI MSPEMAISET 

701 GDEKKEIEDK ISIFEKVWTN HLSQAKEPVS EKFLESGYDI VKQSMSLSNG 

751 DSKLIIAGFT lAKFLKPDNK HRDIQGMAIS YAVKILENYS ENFESETIPL 

801 FRISMSTLYK IITTGQGDIS KHKSRILDIF SKIMLRYHSK KVYHAPEEQE 

851 MFLVHSLLTE NKLEYIFAEY LNIEHTDKCD SALGFCLEES LKQGPDAFNR 

901 LLWNSAKSFS TISQPCAEKF VRVFIIMSKR lARDNNLGHH LFVIALLEAY 

951 TYCDIEKFGY KSYLLLFNAI KEFLVSKPWL FSQYCIEMLL PFCLKTLAFI 

1001 VNHESTDEIN EGFINIIEVI DHMLLVHRFK FSNRHHLFNS VLCQILEIIA 

1051 IHDGTLCANS ADAVARLITN YCEPYNVSNA QNGQKNNLSS KISLIKQSIR 

1101 KNVLWLTKY IQLSITTQFS LNIKKSLQPG IHAIFDILSQ NELNQLNAFL 

1151 DTPGKQYFKA LYLQYKKVGK WRED 
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FIGUXUB 62. BESl DNA Sequence 

DNA sequence includes 1089bp 5* promoter sequence. 

Syrobols: 1 to: 2394 from: chrlS.gcg ck: 9129, 780903 to: 783296 

Chromosome XV Sequence 

Nature 387:98-102 [97313270] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV. 

Dujon, B., Albermann, K., Aldea^ M., Alexandraki, D,, Ansorge, W., 
Arino, J., Benes, V. , Bohn, Bolotin-Fukuhara, M., Bordonne, R., . . . 

gcgseq.tmp. 10515 Length: 2394 March 26, 1999 14:35 Type: N Check: 4842 
« • 

1 CATGGCTGGA GGAAAGATTC CTATTGTAGG AATTGTGGCA TGTTTACAGC 

51 CGGAGATGGG GATAGGATTT CGTGGAGGTC TACCATGGAG GTTGCCCAGT 

101 GAAATGAAGT ATTTCAGACA GGTCACTTCA TTGACGAAAG ATCCAAACAA 

151 AAAAAATGCT TTGATAATGG GAAGGAAGAC ATGGGAATCC ATACCGCCCA 

201 AGTTTCGCCC ACTGCCCAAT AGAATGAATG TCATTATATC GAGAAGCTTC 

251 AAGGACGATT TTGTCCACGA TAAAGAGAGA TCAATAGTCC AAAGTAATTC 

301 ATTGGCAAAC GCAATAATGA ACCTAGAAAG CAATTTTAAG GAGCATCTGG 

351 AAAGAATCTA CGTGATTGGG GGTGGCGAAG TTTATAGTCA AATCTTCTCC 

401 ATTACAGATC ATTGGCTCAT CACGAAAATA AATCCATTAG ATAAAAACGC 

451 AACTCCTGCA ATGGACACTT TCCTTGATGC GAAGAAATTG GAAGAAGTAT 

501 TTAGCGAGCA AGATCCGGCC CAGCTGAAAG AATTTCTTCC CCCTAAAGTA 

551 GAGTTGCCCG AAACAGACTG TGATCAACGC TACTCGCTGG AAGAAAAAGG 

601 TTATTGCTTC GAATTCACTC TATACAATCG TAAATGAAAC CTCTCCGCCC 

651 GTATATTTTT TTTAATATGT TA7\ATAGTGA TAGAACTGAT AAGCCTCATT 

701 TTCTTTTATT GGGCTCCAAG ACGCGAACTG TTCGTAGGGT AACCGTTTGA 

751 CACCTAAACG ACCTTTCAGC CTCACCTGCA GTATTTCTTC J\ACAACGCCT 

801 GTCGCTATGT TAAATAATAG CAATCGTTTG TGATCACCAT TGTCGAATTT 

851 GACGCGCTTA AACAAAAACC ATTGTTTTGG CCTCGTTCCC TGCATTCAAC 

901 AAAAGAGCAA GGTATGCCGT CAAACAGTCG TTAAAAGAGA AGGTTTATAA 

951 ACTATCTTGT TTTGTACTTT GCTGTCCCGG ATCCAGTTGG GTCTTCTTTT 

1001 CAACCTGTCT GAGTCCGATC TTTCTTTCCC TACTTGAAGC TCCATATATC 

1051 TAAGTCATCT AAGTGTATCC TGCTAGATTA CAAACGAAAA TGTCTCAACA 

1101 CGCAAGCTCA TCTTCTTGGA CTTCTTTTTT GAAATCGATA AGTTCGTTCA 

1151 ACGGAGATCT ATCGTCTTTG TCTGCACCAC CGTTTATTCT TTCTCCCACT 

1201 TCCTTAACAG AGTTTTCTCA GTATTGGGCT GAACATCCAG CTTTATTTCT 

1251 GGAGCCTTCG TTGATTGATG GTGAAAACTA CAAAGATCAC TGTCCCTTTG 

1301 ACCCAAATGT GGAATCAAAG GAAGTGGCGC AGATGTTGGC GGTTGTTAGG 

1351 TGGTTTATTT CTACTTTGAG ATCTCAATAC TGCTCTAGAA GCGAATCGAT 

1401 GGGTTCTGAA AAGAAGCCTT TGAACCCATT CTTGGGTGAG GTATTTGTTG 

1451 GAAAGTGGAA AAATGATGAG CATCCAGAGT TTGGTGAAAC GGTTCTTTTA 

1501 AGTGAGCAAG TTTCACATCA TCCACCTATG ACAGCATTTT CGATTTTTAA 

1551 TGAAAAAAAT GATGTTTCTG TTCAAGGATA CAATCAAATT AAAACTGGTT 

1601 TTACCAAAAC ATTGACGCTA ACGGTCAAAC CATACGGGCA TGTCATTTTG 

1651 AAGATTAAAG ATGAGACCTA CCTGATTACA ACCCCGCCTT TGCATATCGA 

1701 AGGTATTTTA GTCGCTTCTC CATTTGTTGA ATTAGGAGGC AGGTCATTCA 

1751 TACAGTCATC AAATGGTATG TTATGTGTTA TAGAATTTTC AGGAAGGGGG 

1801 TATTTCACAG GGAAGAAGAA CTCCTTTAAG GCAAGAATTT ACAGAAGCCC 

1851 ACAAGAGCAT AGTCATAAAG AAAATGCGCT ATACCTAATC TCTGGCCAAT 

1901 GGTCAGGTGT TTCAACAATT ATAAAAAAAG ACTCGCAAGT TTCACATCAG 

1951 TTTTACGATT CATCGGAAAC TCCTACTGAA CATTTATTAG TTAAGCCAAT 

2001 GGAAGAAGAA CATCCTCTGG AAAGTAGGAG GGCATGGAAG GATGTGGCAG 

2051 AAGCAATCAG ACAAGGAAAT ATTAGTATGA TAAA7\AAGAC TAAGGAAGAA 

2101 CTAGAAAATA AGCAAAGAGC CTTGAGAGAA CAAGAACGCG TAAAAGGTGT 

2151 GGAATGGCAA AGAAGATGGT TCAAACAAGT GGACTACATG AATGAAAATA 

2201 CATCAAATGA TGTAGAGAAA GCAAGTGAAG ATGATGCCTT TAGGAAATTG 

2251 GCGTCCAAAC TGCAGCTTTC TGTGAAAAAT GTGCCAAGTG GGACATTGAT 

2301 TGGCGGCAAA GATGATAAGA AAGATGTTTC AACCGCATTG CATTGGAGGT 
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FIGUXUEi 63. BESl Protein Sequence 

Nature 387:98-102 (97313270] {1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV, 

Dujon, B., Alberraann, K., Aldea^ M. , Alexandraki, D., Ansorge^ W., 
Arino, J., Benes, v., Bohn^ Bolotin-Fukuhara, M. r Bordonne, R., 
Boyer, J., Camasses, A., Casamayor, A., Casas, C, Cheret, G*, et al. 

YOR237W Length: 434 March 26, 1999 14:37 Type: P Check: 7501 

1 MSQHASSSSW TSFLKSISSF NGDLSSLSAP PFILSPTSLT EFSQYWAEHP 

51 ALFLEPSLID GENYKDHCPF DPNVESKEVA QMLAWRWFI STLRSQYCSR ' 

101 SESMGSEKKP LNPFLGEVFV GKWKNDEHPE FGETVLLSEQ VSHHPPMTAF 

151 SIFNEKNDVS VQGYNQIKTG FTKTLTLTVK PYGHVILKIK DETYLITTPP 

201 LHIEGILVAS PFVELGGRSF IQSSNGMLCV lEFSGRGYFT GKKNSFKARI 

251 YRSPQEHSHK ENALYLISGQ WSGVSTIIKK DSQVSHQFYD SSETPTEHLL 

301 VKPIEEQHPL ESRRAWKDVA EAIRQGNISM IKKTKEELEN KQRALREQER 

351 VKGVEWQRRW FKQVDYMNEN TSNDVEKASE DDAFRKLASK LQLSVKNVPS 

401 GTLIGGKDDK KDVSTALHWR FDKNLMMREN EITI 
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FIGURE 65. Rat Gene with Similarity to XJOtlOOv 



LOCUS 

DEFINITION 

ACCESSION 

PID 

DBSOURCE 
KEYWORDS 
SOURCE 
ORGANISM 



1397235 334 aa 

ovarian-specific protein. 
1397235 
gl397235 

locus RNU44d03 accession U448031 



04-FEB-1999 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 
FEATURES 

source 



ORIGIN 



// 



Norway rat. 
Rattus norvegicus 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebra ta; Eutheria; Rodentia; Sciurognathi; Myomorpha; Muridae; 
Murinae; Rattus. 

1 (residues 1 to 334) 

Duan^W.R., Llnzer, D. I . H. and Gibori,G. 

Cloning and characterization of an ovarian-specific protein that 
associates with the short form of the prolactin receptor 
J. Biol. Chem. 271 (26), 15602-15607 (1996) 
96279080 

2 (residues 1 to 334) 
GiborirG. and Duan^W.R. 
Direct Submission 

Submitted (05- JAN- 1996) Geula Gibori, Department of Physiology, 
University of Illinois at Chicago, Chicago, IL 60612, USA 
Method: conceptual translation. 

Location/Qualifiers 
1. .334 

/organism="Rattus norvegicus" 
/ s t r ain= " Sp r ague- Dawl ey " 
/db_xref="taxon: 10116" 
/sex=" female" 

/tissue_type=" corpus luteum" 
/dev_stage="pregnant" 
/ eel l_t ype= "luteal" 
1. .334 

/pr oduct= " ovarian- speci f i c protein " 
1. .334 

/note="The protein can associate with the short form of 
prolactin receptor in the rat corpus luteiim. " 
/coded_by="U44803:15, .1019" 

1 mr]cwlitga ssgiglalcg rllaedddlh Iclacrnlsk agavrdalla shpsaevsiv 
61 qmdvsnlqsv vrgaeevkrr fqrldylyln agimpnpqln Ikaffcgifs rnvihmfsta 
121 eglltqndki tadgfqevfe tnlfghfili relepllchs dnpsqliwts srnakksnfs 
181 lediqhakgq epyssskyat dllnvalnrn fnqkglyssv tcpgwmtnl tygilppfvw 
241 tlllpviwll rffahaftvt pyngaealvw Ifhqkpesln pltkylsgtt glgtnyvkgq 
301 kmdvdedtae kfyktllele kqvritiqks dhhs 



Protein 



CDS 
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FIGURE 66. DAKl DNA Sequence 

This sequence contains 1200bp of 5' promoter sequence • 
Symbols: 1 to: 2955 from: chrl3,gcg ck: 8335, 

132275 to: 135229 

Chromosome XIII Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence 
of . Saccharomyces cerevisiae chromosome XIII. 
Bowman, S., Churcher, C, Badcock, K., Brown, D., 
Chillingworth, T., Connor, R.., Dedman, K,, Devlin, K., 
Gentles, S., Hamlin, N., Hunt, S,, . 

gcgseq.tmp .16080 Length: 2955 March 31, 1999 09:57 Type- 
N Check: 5254 

1 TAATATAAAT ACTAGTCGTT AGATGATAGT TGCTtCTTAT TCCGAAAATG 
51 AGTATGGAAG TGTTGCATAT GATAGGGCGG CTACAGTGAT GGTAAACATA 
101 AGATACTTTA GCGGGAAATT AGCAACTGGA AGTTAAATTA TCTAGACATA 
151 AGTGTGGCGG TCACGCTGAA CGCAGGAGAT CGGATAGATT GATAAGCTGA 
201 TCAAGAACAT TGATCGGTTT GTTGTTTAAA GAATGGTTTT TGAAAACGTT 
251 TGACCAGTTG CTTCTCCCAG ACGCTTACCG ATATGATGAT AAAGATAATA 
301 TCTTCAATTG AATACCCCGT GGATCAGCAC GAATAACAGA AAAAAAGGGT 
351 GAAATTCACC GTAAGCATGA TACGCACTAC GTTCTTCTTA CCTTTGCCAA 
401 CGTGTTGTCT TTGACGTACG TAATTATGGG AGATCGTTGA TGATTAGCCC 
451 CAGCTCACTT TCTTCTTAAT GACTGACCCG CTACTATCAA AATTAAGGTG 
501 TCAAATATCA TGATGAATGA GGTCTCTAGG CGACTCAATT ATACATCTTT 
551 TAGAGATTTT TTTACTACTT GCAGATAATT TCTCAAGGGA TTAGATTCAA 
601 ATCTGGCTTG TCAATTACGC CCTTTTCAAG CTCATCAAAT TGCGTATGTC 
651 ATTCATGCTT CCATTAGGAA CCATAGAAGC ATGGCTGAAA TGGCAATATA 
701 CGGCTTCCCA ATTTCAACTC TAAAGTAATG GCGGTCGAAT TTAATCTATA 
751 TTTTACAGTT TTATACGTAC TTTAAAAGCA ATCAGTAAAC ACCTCTGGTG 
801 CTATTCAAGG GTTTTTTGCC TTTATTTGTT ACTGTCAATT GTCTGGCGCT 
851 GTGATAAAAA ACAAGGCATA AAGCTCCCCC GTCATGAACA TTAAGACTCG 
901 CTAGACGAGA GAGTGAAATA TAATGCATTT CCTGATTTAA ATGCGCTACA 
951 AACATGGTGT AAATCTGGCC CGGAGTGAGT GCTTGCCAAT TTGGCTTCTA 
1001 AGGGAGAAAG ATCAAACCAC TCCCAATTGC GTCATTTTGA AAGAGTGGCC 
1051 ACCTCGCGAG CGTCTGTCGA ACTAACTGAT GAATAAATAT ATAAGGAGAA 
1101 AATCACTTCA ACTTCGCTAC AAGTAGTCAC TATTTGTAGC AACTGTAAAC 
1151 GAACACATCA AAGAATAAGA TTACATTCTA TATCTAAGAC TAAATTTTAA 
1201 ATGTCCGCTA AATCGTTTGA AGTCACAGAT CCAGTCAATT CAAGTCTCAA 
1251 AGGGTTTGCC CTTGCTAACC CCTCCATTAC GCTGGTCCCT GAAGAAAAAA 
1301 TTCTCTTCAG AAAGACCGAT TCCGACAAGA TCGCATTAAT TTCTGGTGGT 
1351 GGTAGTGGAC ATGAACCTAC ACACGCCGGT TTCATTGGTA AGGGTATGTT 
1401 GAGTGGCGCC GTGGTTGGCG AAATTTTTGC ATCCCCTTCA ACAAAACAGA 
1451 TTTTAAATGC AATCCGTTTA GTCAATGAAA ATGCGTCTGG CGTTTTATTG 
1501 ATTGTGAAGA ACTACACAGG TGATGTTTTG CATTTTGGTC TGTCCGCTGA 
1551 GAGAGCAAGA GCCTTGGGTA TTAACTGCCG CGTTGCTGTC ATAGGTGATG 
1601 ATGTTGCAGT TGGCAGAGAA AAGGGTGGTA TGGTTGGTAG AAGAGCATTG 
1651 GCAGGTACCG TTTTGGTTCA TAAGATTGTA GGTGCCTTCG CAGAAGAATA 
1701 TTCTAGTAAG TATGGCTTAG ACGGTACAGC TAAAGTGGCT AAAATTATCA 
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1751 . ACGACAATTT GGTGACCATT GGATCTTCTT TAGACCATTG TAAAGTTCCT 

1801 GGCAGGAAAT TCGAAAGTGA ATTAAACGAA AAACAAATGG AATTGGGTAT 

1851 GGGTATTCAT AACGAACCTG GTGTGAAAGT TTTAGACCCT ATTCCTTCTA 

1901 CCGAAGACTT GATCTCCAAG TATATGCTAC CAAAACTATT GGATCCAAAC 

1951 GATAAGGATA GAGCTTTTGT AAAGTTTGAT GAAGATGATG AAGTTGTCTT 

2 001 GTTAGTTAAC AATCTCGGCG GTGTTTCTAA TTTTGTTATT AGTTCTATCA 

2051 CTTCCAAAAC TACGGATTTC TTAAAGGAAA ATTACAACAT AACCCCGGTT 

2101 CAAACAATTG CTGGCACATT GATGACCTCC TTCAATGGTA ATGGGTTCAG 

2151 TATCACATTA CTAAACGCCA CTAAGGCTAC AAAGGCTTTG CAATCTGATT 

2201 TTGAGGAGAT CAAATCAGTA CTAGACTTGT TGAACGCATT TACGAACGCA 

2251 CCGGGCTGGC CAATTGCAGA TTTTGAAAAG ACTTCTGCCC CATCTGTTAA 

2301 CGATGACTTG TTACATAATG AAGTAACAGC AAAGGCCGTC GGTACCTATG 

2351 ACTTTGACAA GTTTGCTGAG TGGATGAAGA GTGGTGCTGA ACAAGTTATC 

2401 AAGAGCGAAC CGCACATTAC GGAACTAGAC AATCAAGTTG GTGATGGTGA 

2451 TTGTGGTTAC ACTTTAGTGG CAGGAGTTAA AGGCATCACC GAAAACCTTG 

2501 ACAAGCTGTC GAAGGACTCA TTATCTCAGG CGGTTGCCCA AATTTCAGAT 

2551 TTCATTGAAG GCTCAATGGG AGGTACTTCT GGTGGTTTAT ATTCTATTCT 

2601 TTTGTCGGGT TTTTCACACG GATTAATTCA GGTTTGTAAA TCAAAGGATG 

2651 AACCCGTCAC TAAGGAAATT GTGGCTAAGT CACTCGGAAT TGCATTGGAT 

2701 ACTTTATACA AATATACATVA GGCAAGGAAG GGATCATCCA CCATGATTGA 

2751 TGCTTTAGAA CCATTCGTTA AAGAATTTAC TGCATCTAAG GATTTCAATA 

2 801 AGGCGGTAAA AGCTGCAGAG GAAGGTGCTA AATCCACTGC TACATTCGAG 

2851 GCCAAATTTG GCAGAGCTTC GTATGTCGGC GATTCATCTC AAGTAGAAGA 

2901 TCCTGGTGCA GTAGGCCTAT GTGAGTTTTT GAAGGGGGTT CAAAGCGCCT 

2951 TGTAA 



FIGURE 66 (cont) 
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FIGURE 67. DAKl Protein Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence 
of Saccharomyces cerevisiae chromosome XIII. 

Bowman, S., Churcher, Badcock, K., Brown, D., 
Chillingworth, T., Connor, R., Dedman, K., Devlin, K., 
Gentles, S., Hamlin, N., Hunt, S., Jagels, K., Lye, G., 
Moule, S., Odell, C, Pearson, D., Rajandream, et al. 



YML070W Length: 584 March 31, 1999 09:58 Type: P Check: 

167 ' . . 

1 MSAKSFEVTD PVNSSLKGFA LANPSITLVP EEKILFRKTD SDKIALISGG 

51 GSGHEPTHAG FIGKGMLSGA WGEIFASPS TKQILNAIRL VNENASGVLL 

101 IVKNYTGDVL HFGLSAERAR ALGINCRVAV IGDDVAVGRE KGGMVGRRAL 

151 AGTVLVHKIV GAFAEEYSSK YGLDGTAKVA KIINDNLVTI GSSLDHCKVP 

201 GRKFESELNE KQMELGMGIH NEPGVKVLDP IPSTEDLISK YMLPKLLDPN 

251 DKDRAFVKFD EDDEWLLVN NLGGVSNFVI SSITSKTTDF LKENYNITPV 

301 QTIAGTLMTS FNGNGFSITL LNATKATKAL QSDFEEIKSV LDLLNAFTNA 

351 PGWPIADFEK TSAPSVNDDL LHNEVT7UCAV GTYDFDKFAE WMKSGAEQVI 

401 KSEPHITELD NQVGDGDCGY TLVAGVKGIT ENLDKLSKDS LSQAVAQISD 

451 FIEGSMGGTS GGLYSILLSG FSHGLIQVCK SKDEPVTKEI VAKSLGIALD 

501 TLYKYTKARK GSSTMIDALE PFVKEFTASK DFNKAVKAAE EGAKSTATFE 

551 AKFGRASYVG DSSQVEDPGA VGLCEFLKGV QSAL 
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FIGURE 68. BGUl DNA Sequence 

DNA sequence includes 1200bp of 5' promoter sequence. 
Symbols: 1 to: 2286 from: chrlO.gcg ck: 4711, 

721304 to: 723589 
Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 
Galibert/ F., Alexandraki, D,, Baur, A., Boles, E., 
Chalwatzis, N., Chuat, J. C, Coster, F,, Cziepluch, C, De 
Haan, M., Domdey, H., . • . 

gcgseq.tmp. 30022 Length: 2286 March 31, 1999 09:20 Type: 
N Check: 4618 

1 ATGATTCTGA CGACCCTTTG ATAGTGGCAA TGATCAAAAA GAAAAAAAAA 
51 AGATAAGACG GTAGTGTGAA GATGACATAT AGCGCTACTC TATACTCGTC 
101 CAACTTCGAA AATAATATGT GGTCGTTGGT ACGTTCAGAT AAGAGAATAC 
151 ATCTCGCGCG TACGCATAAT TGTGGTCTAA AAAACCGCTG AAATTTTCTC 
201 AATACTGAAT AGAATCACGC TACTACGACA AGACTCGGTT ACTGTGCCTA 
251 AAATAATCCT GTGATAAACG AGTTATGTTA AACGCAGTAC AGGGGTTAAA 
301 GGGCATTGAG TTTTTGTGAG TGGAAATGCC CCCGTTATAG CTTCCAGTTT 
351 AATTACAAAT TATCAATTTA AGCAAATATA ACTGGAGGAT TGGGGAGGCG 
401 ACTAAAAATG GCTACCACGC TATTAGACAT ACAACATTGA GTATTTTATG 
451 TAATTTTGTT ACTGCTAGCA CGGCCATGCA ATTGGCAACT GAAAGCTATC 
501 TGACAACTTA AATGATTCTT AAAACAATGA CGACTATAAT CTTCTCTAAG 
551 AAGTTTCATA TCCATCTTCC TCATTATTCA GTTTCTTTTT CCTCTTGAAA 
601 GTATCGTAAA GAACAACGTC TTCACATTAG CTATTAGAAG ACCATTGAAC 
651 TACCGGATAT GAGTAAGAGT GATCTTGCCG GAGAGATAAT AGCTGCACAA 
701 AGGCCAAGGA TTAGATTAAT GGGTGCATTG TACGAAAAAA AATAGTTTAC 
751 AGTCATTTAT TCGCAATAAA TCAATTTTTT TTTCAAAAAA TATGTAAGTC 
801 ' TGATAAAAAA TTCTTCACTG AAGAGAGATG CTTACATTCT AATTCTTGAA 
851 TAAAAGACTC TCTAACGCTG TGAATTCTCT TTAGCTGTAA CGGAAACAGA 
901 GAGTTATTCC GTAGTCACTG AATTTTTTTT TTTTGACGCT ATTATTTAAA 
951 ACCTAGGATA TCCGTCCCAT ACAAAACGGC CACGAGTTTC JUVTCCCAGAA 
1001 TGTACGAGTT ATAATTCTCC TAGATGCATG ATACTCGTGC ATTCGTTTAA 
1051 CAATCATACC AATTTCCCAT TTTCGGGATA TTAAACATGA ACATACTTTT 
1101 TTACTGTGAG AATGTGGTTT CACAATTATT CCATACAGGT ATAAAAACGC 
1151 ACAGAACTTC AAACGGGAAG ACTATCTACC CACATTGATG GACAAACGCA 
1201 ATGATTTCTG CTAATTCATT ACTTATTTCC ACTTTGTGCG CTTTTGCGAT 
1251 CGCAACACCT TTGTCAAAAA GAGATTCCTG TACCCTAACA GGATCTTCTT 
1301 TGTCTTCACT CTCAACCGTG AAAAAATGTA GCAGCATCGT TATTAAAGAC 
1351 TTAACTGTCC CAGCTGGACA GACTTTAGAT TTAACTGGGT TAAGCAGTGG 
1401 TACTACTGTT ACGTTTGAAG GCACAACCAC ATTTCAGTAC AAGGAATGGA 
1451 GCGGCCCTTT AATTTCAATC TCAGGGTCTA AAATCAGCGT TGTTGGTGCT 
1501 TCGGGACATA CCATTGATGG TCAAGGAGCA AAATGGTGGG ATGGCTTAGG 
1551 TGATAGCGGT AAAGTCAAAC CGAAGTTTGT AAAGTTGGCG TTGACGGGAA 
1601 CATCTAAGGT CACCGGATTG AATATTAAAA ATGCTCCACA ccaagtcttc 
1651 AGCATCAATA AATGTTCAGA TTTAACCATC AGCGACATAA CAATTGATAT 
1701 CAGAGACGGT GATTCGGCTG GTGGTCATAA TACGGATGGG TTTGATGTTG 
1751 GTAGTTCTAG TAACGTCTTA ATTCAAGGAT GTACTGTTTA TAATCAGGAT 
1801 GACTGTATTG CTGTGAATTC CGGTTCAACT ATTAAATTTA TGAACAACTA 
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1851 CTGCTACAAT GGCCATGGTA TTTCTGTAGG TTCTGTTGGT GGCCGTTCTG 

1901 ATAATACAGT CAATGGTTTC TGGGCTGAT^ ATAACCATGT TATCAACTCT 

1951 GACAACGGGT TGAGAATAAA AACCGTAGAA GGTGCGACAG GCACAGTCAC 

2001 TAATGTCAAC TTTATCAGTA ATAAAATTAG CGGCATAAAA AGTTATGGTA 

2051 TTGTTATCGA AGGCGATTAT TTGAATAGTA AGACTACTGG AACTGCTACA 

2101 GGTGGCGTTC CCATTTCGAA TTTAGTAATG AAGGATATCA CCGGGAGCGT 

2151 GAACTCCACA GCGAAGAGGG TTAAAATTTT GGTGAAAAAC GCTACTAACT 

2201 GGCAATGGTC TGGGGTGTCA ATTACCGGTG GTTCTTCCTA TTCTGGATGT 

2251 TCTGGAATCC CATCTGGATC TGGTGCAAGC TGTTAA 



FIGURE 68 (cont) . 
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FIGUR£ 69. PGOl Protein Sequence 



\0 J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosoiae X. 



Galibert/ F., Alexandraki, D,, Baur, A., Boles, E., 
Chalwatzis, N., Chuat,. J. Coster, F., Cziepluch, De 
Haan, M., Domdey, H., Durand, P., Entian, K, D., Gatius, M., 
Goffeau, A., Grivell, L. A., et al. 



YJR153W Length: 361 March 31, 1999 09:55 Type: P Check- 
9795 

1 MISANSLLIS TLCAFAIATP LSKRDSCTLT GSSLSSLSTV KKCSSIVIKD 

51 LTVPAGQTLD LTGLSSGTTV TFEGTTTFQY KEWSGPLISI SGSKISWGA 

101 SGHTIDGQGA KWWDGLGDSG KVKPKFVKLA LTGTSKVTGL NIKNAPHQVF 

151 SINKCSDLTI SDITIDIRDG DSAGGHNTDG FDVGSSSNVL IQGCTVYNQD 

201 DCIAVNSGST IKFMNNYCYN GHGISVGSVG GRSDNTVNGF WAENNHVINS 

251 DNGLRIKTVE GATGTVTNVN FISNKISGIK SYGIVIEGDY LNSKTTGTAT 

301 GGVPISNLVM KDITGSVNST AKRVKILVKN ATNWQWSGVS ITGGSSYSGC 

351 SGIPSGSGAS C 
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FIGURE 70 . STE18 DNA Sequence 

This sequence contains 600bp of 5'- promoter sequence. 
Symbols: 1 to: 933 from: chrlO.gcg ck: 4711, 

585156 to: 586088 

Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 
Galibert/ F., Alexandraki^ D., Baur, A., Boles, E., 
Chalwatzis, N., Chuat/ J. C, Coster, F., Cziepluch, C, De 
Haan, H., Domdey, H., . . . 

gcgseq.tmp.6719 Length: 933 March 31, 1999 10:01 Type: N 
Check: 8833 

1 TTCGTTTCTG TCTTGTCTCC CGCTGTTACC TAATAACTTC ATGTGATCTG 
51 CTCCCCCTTC TCGTTAAATA CCACCTTTTC ATCAACCCCG TAGGGCGCGA 
101 CACGTCTAAA ATATTAACCT CTGAATACTT ATTGGGTCAA AATGAATGTT 
151 GATAACTTTC CTTTACAA?\A AAAAAACTAA TAGAGTATAT GCATTTCGGT 
201 AGTGAAATAT TCGTTAATGC TAATATGCTC AGTAGTGATC CTAGATTACC 
251 AGTTTTACTG CAGCCATCGT ACAATTTTGG AACGAGTATA AAGAGAGAAA 
301 TTAAAAACGA CAAGAAATAT TCGTACTAGC TTCTCTTCCG GCTTGATGAC 
351 AGTCTTAATA TCATCTGCAA CTCTTGAAAT CTTGCTTTAT AGTCAAAATT 
401 TACGTACGCT TTTCACTATA TAATATGATT TGTCAATGTG ATGAGTGAAT 
451 GTCTCCCTGT TACCCGGTTT TCATGTTGAT TTTTGTTTCA GGCTCTAAAT 
501 GTTTGATGCA ATATTTAACA AGGAGAACAG AAATGTTTTG TGACAGCACC 
551 TGTCAATTTT AGGATAGTAG CAATCGCAAA CGTTCTCAAT AATTCTAAGA 
601 ATGACATCAG TTCAAAACTC TCCACGCTTA CAACAACCTC AGGAACAGCA 
651 ACAGCAACAG CAACAGCTTT CCTTAAAGAT J^AAACAATTG AAGTTTU^AAA 
701 GJVATCAACGA ACTTAACAAT AAACTGAGGA AAGAACTCAG CCGTGAAAGA 
751 ATTACTGCTT CAAATGCATG TCTTACAATA ATAAACTATA CCTCGAATAC 
801 AAAAGATTAT ACATTACCAG AACTATGGGG CTACCCCGTA GCAGGATCAA 
851 ATCATTTTAT AGAGGGTTTG AAAAATGCTC AAAAAAATAG CCAAATGTCA 
901 AACTCAAATA GTGTTTGTTG TACGCTTATG TAA 
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FIGURE 71. STE19 Protein Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 

Galibert, F., Alexandraki, D,, Baur, A., Boles, E,, 
Chalwatzis, N., Chuat, J. C, Coster, F., Cziepluch, C, De 
Haan, M., Domdey, H., Durand, P., Entian, K. D., Gatius, M., 
Goffeau, A., Grivell, L. A., et al. 



YJR086W Length: 110 March 31,. 1999 10:02 Type: P Check- 
6859 jr*- v^iv. 

1 MTSVQNSPRL QQPQEQQQQQ QQLSLKIKQL KLKRINELNN KLRKELSRER 
51 ITASNACLTI INYTSNTKDY TLPELWGYPV AGSNHFIEGL KNAQKNSQMS 
101 NSNSVCCTLM 
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FIGURE 72. YGL19SW DNA Sequence 

This sequence contains 989bp of 5' promoter sequence. 
Symbols: 1 to: 1775 from: chr7.gcg ck: 9962, 

122605 to: 124379 

Chromosome VII Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence 
of Saccharomyces cerevisiae chromosome VII. 
TetteliH/ H., Agostoni Carbone, M, L., Albermann, K., 
Albers, M,/ Arroyo, J., Backes, U., Barreiros, T., Bertani, 
I., Bjourson, A. J,, • . 

gcgseq.tmp. 32650 Length: 1775 March 31, 1999 10:03 Type: 
N Check: 2850 

1 GAGAATTATT CGCGACTTCA GGTTATCCAA TCGTGTATGT AATCGTATGT 
51 AGGCAAT^GT AAATAGATAT GAACTACATT TTCCTGCTTT ACTTAGACTA 
101 GAGATGTGAC CTCAAAGAAT CTTCTCAAGT AGTATATCTG GAAAAGAGAG 
151 TTTGCAATAA CGACGCCCAA TTGGAAGATG GACCACCATT TAACACGATC 
201 GTTGGTCGAC TCTGCAGTAT TTCTATGCGT CCTTTCTCTA ATT^CAATAT 
251 AACTTTGTTC GTCCTTGACT TCCCTGGTTA ATTTGGACAA CTTTCTGACA 
301 GCACTATCCA ATGTATTGGT GTTTGGGTCG TCCAAATCCA CATATACCAC 
351 CCCATGAATG TTGAAAGTCA CGTCTTTTGT CTCGATACCG GTGTTCTCGT 
401 TCAAGAAACA GTATTGGAAA TGTCCCTTGT ATGGAGCAGA CAATGTGATT 
451 TCACCGTGCG ACGTGTCCCT AACCGTTTTC AAAACTTCAT GTCTTTCCGG 
501 CCCGTAGATG ATAAAGTCAC CAGTCAGCTG GCTACTGGAT TGAGGGTTTC 
551 TATCACCGAA CTGGAACGAA ATGGAGAGCT CGTCACCCTT ACTCAAGTCT 
601 TCGAAGAAGC ATCTACGGCC ATAAGCTGGA AGAAGGACAT TATGGGCGGA 
651 CGCCGAGAAG AACAGGAAGC AAGCAATGAC AAACTTAGTA GCAAATGAGG 
701 CCATCCTTAT GCGTGTGTAT TTTTGTGCGG AGGGATACTA TTAAGATTGC 
751 AGTTTCACCA AGTATAGCTT TTTATTTCAT TATAAGTTTC GTGTCAAAAT 
801 GTTTAAGCGA CCCGATCTCT CAGGCTGTTT TGCACGACTT TTCTGACTTT 
851 CCTCGCGTCT TTTTTCATGA AAATTGGATT ACCCGGAGTG ATGATTTTCT 
901 CACAGTGATT TTTCGTCCCC TTTTACAATA GCAAATGAAG CTGTTTTAGC 
951 AATATTTGTA GAAAGATATG TCACAAGAGG GCAGGCAAAA TGTCATACGG 
1001 AAGAGAAGAC ACTACGATTG AGCCCGACTT CATAGAACCA GATGCACCTT 
1051 TGGCTGCTTC CGGGGGTGTT GCTGACAACA TAGGCGGAAC TATGCAGAAT 
1101 TCAGGCAGCA GAGGGACGCT CGACGAGACT GTGCTGCAAA CACTAAAGCG 
1151 AGATGTGGTG GAGATTAATT CCAGACTGAA ACAAGTGGTA TACCCGCATT 
1201 TCCCCTCATT CTTTAGCCCC TCTGATGACG GGATAGGGGC GGCTGATAAC 
1251 GACATTTCAG CCAATTGCGA CCTGTGGGCG CCCCTTGCGT TTATCATATT 
1301 GTATTCTCTA TTTGTATCGC ATGCGCGGTC GCTGTTCTCG AGCCTATTTG 
1351 TGTCTAGTTG GTTCATTTTG CTGGTGATGG CATTGCATCT GAGACTCACC 
1401 AAGCCACACC AGAGGGTGTC GCTGATTTCG TACATCTCCA TTTCCGGGTA 
1451 TTGCTTATTC CCACAAGTGC TGAATGCCTT AGTCTCGCAG ATACTACTTC 
1501 CATTGGCCTA CCATATTGGA AAGCAAAATC GCTGGATTGT GAGGGTCCTG 
1551 TCGCTCGTGA AACTGGTGGT CATGGCGCTG TGCCTGATGT GGTCTGTGGC 
1601 CGCCGTTTCG TGGGTTACCA AGAGCAAGAC CATTATCGAG ATATACCTCT 
1651 GGCACTCTGT CTTTTTTGGC ATGGCTGGTT GTCAACTATT. TTATAACACT 
1701 AGTTACATAT GTATAAAACC CAATATTCAT GGACATAGAA TTGCCTATCT 
1751 CGCGAGCCAC GGCAGAAAGT TCTGA 
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FIGURE 13. YGL19BV Protein Sequence 

Nature 387:81-84 (97313265] (1997) The nucleotide sequence 
of Saccharomyces cerevisiae chromosome VII. 

Tettelin, H., Agostoni Carbone, L., Albermann, K., 
Albers, M., Arroyo, J., Backes, U., Barreiros, T., Bertani, 
I., B jour son, A. J,, Bruckner, M., Bruschi, C, V., 
Carignani, G., Castagnoli, L., Cerdan, et al. 



YGL198W Length: 261 March 31, 1999 10:05 Type: P Check: 

1705 . . 

1 MSYGREDTTI EPDFIEPDAP LAASGGVADN IGGTMQNSGS RGTLDETVLQ 

51 TLKRDWEIN SRLKQWYPH FPSFFSPSDD GIGAADNDIS ANCDLWAPLA 

101 FIILYSLFVS HARSLFSSLF VSSWFILLVM ALHLRLTKPH QRVSLISYIS 

151 ISGYCLFPQV LNALVSQILL PLAYHIGKQN RWIVRVLSLV KLWMALCLM 

201 WSVARVSWVT KSKTIIEIYL WHSVFFGMAG CQLFYNTSYI CIKPNIHGHR 

251 lAYLASHGRK F 



86/88 



SUBSTITUTE SHEET (RULE 26) 



wo 00/58521 



PCT/USOO/08604 




1 



87/88 

SUBSTITUTE SHEET (RULE 26) 



^1 



4 




88/88 

SUBSTITUTE SHEET (RULE 26) 



tt 



<I2) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
11 January 2001 (11.01.2001) 




PCT 







(10) International Publication Number 

wo 01/02550 A2 



(51) International Patent Classiflcation^: CI 2N 15/00 



(21) International Application Number: PCT/BB 



• If/Ill 



77 



(22) International Filing Date: 3 July 2000 (03.07.2000) 



Beerse (BE). REEKMANS, Rieka, Josephina [BE/BE]; 
>^jnbergstraat 190, B-8560 Wevelgem (BE). 

(74) Agent: COIGNEZ, Keen; De Clercq, Brants & Partners 
cv, E. Gevaertdreef 10 a, B-9830 Sint-Manens-Latem (BE). 



(25) Filing Language: 

(26) Publication Language: 

(30) Priority DaU: 
99870141.1 



English 
English 

1 July 1999 (01.07.1999) EP 



(71) Applicant (for all designated States except US)i 
JANSSEN PHARMACEUTICA N.V. [BEmE]; Tum- 
houtseweg 30, B-2340 Beerse (BE). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): CONTRERAS, 
Roland, Henri [BE/BE]; Molenstraat 53. B-9820 
Schelderode (BE). DE BACKER, Marianne, Denise 
[BE/BE]; Janssen Pharmaceutica N.V., Tumhoutseweg 
30. B-2340 Beerse (BE). LUYTEN, Walter, Herman, 
Maria, Louis [BE/BE]; Janssen Pharmaceutica N.V, 
Tumhoutseweg 30, B-2340 Beerse (BE). MALCORPS, 
Isabelle, Karin, Luc [BE/BE]; Begijnenstraat 18, B-2980 
Zoersel (BE). NEUSSEN, Bart, Jozef, Maria [BEmE]; 
Janssen Pharmaceutica N.V, 'Himhoutseweg 30, B-2340 



(81) Designated States (national)i AE, AC, AL, AM, AT, AU, 
AZ, BA. BB. BG, BR, BY. CA, CH, CN, CR. CU, CZ, DE 
DK, DM, DZ. EE, ES, FI. GB, GD, GE, GH, GM, HR, HU, 
ID, IL, IN, IS, JP. KE, KG, KP, KR, KZ, LC, LK, LR. LS, 
LT, LU, LV, MA. MD, MG, MK, MN, MW^ MX, NO. NZ, 
PL, PT, RO, RU, SD, SE, SG, SI. SK. SL, TJ, TM, TR. TT. 
TZ, UA, UG. US. U2, VN, YU, ZA. ZW. 

(84) Designated States (regional)', ARIPO patent (GH. GM. 
KE. LS. MW, MZ. SD. SL, SZ. TZ. UG. ZW). Eurasian 
patent (AM. AZ, BY, KG. KZ. MD, RU. TJ. TM), European 
patent (AT. BE, CH, CY, DE. DK, ES, Fl, FR, GB, GR, IE, 
IT, LU, MC, NL. PT, SE), OAPI patent (BF. BJ, CF, CG, 
CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 

Published: 

— Without international search report and to be republished 
upon receipt of that report. 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations " appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



< 



If) (54) Title: CELL DEATH RELATED DRUG TARGETS IN YEAST AND FUNGI 
fsl 

^ (57) Abstract: The invention describes the use of nucleic acids and polypeptides whidi are involved in a pathway eventually lead- 
ing to progranuned cell deadi of yeast or ftmgi for the preparation of a medicament for treating diseases associated with yeast or 
2 fimgi or for the treaoneat of proliferative disorders or for preventing apoptosis in certain diseases. Methods arc provided to iden- 
tify compounds which selectively modulate the expression or functionality of said polypeptides in the same or a parallel pathway. 
Also provided are compounds as well as pharmaceutical compositions, medicaments and vaccines. The invention also comprises 
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polypeptides and antibodies raised against said polypeptides. 
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CELL DEATH RELATED DRUG TARGETS IN YEAST AND FUNGI 

The present invention relates to the identification of genes and proteins 
encoded thereof from yeast and fungi whose expression is modulated upon 
5 programmed cell death and which genes, proteins or functional fragments and 
equivalents thereof may be used as selective targets for drugs to treat Infections 
caused by or associated with yeast and fungi or for the treatment of proliferative 
disorders or for the prevention of apoptosis in certain diseases. 

invasive fungal infections (e.g. Candida spp, Aspergillus spp., Fusarium spp., 
10 Zygomycetes spp.) (Walsh, 1992) have emerged during the past two decades as 
important pathogens causing fonnidable morbidity and mortality in an increasingly 
diverse and progressively expanding population of immunocompromised patients. 
Those with the acquired immune deficiency syndrome (AIDS) constitute the most 
rapidly growing group of patients at risk for life*threatening mycosis. But fungal 
15 infections have also increased in frequency In several populations of other susceptible 
hosts, including very-low-btrth-weight infants, cancer patients receiving chemotherapy, 
organ transplant recipients, burn patients and surgical patients with complications. 

These fungal infections are not limited to humans and other mammals, but are 
also important in plants where they can cause diseases or cause the production of 
20 unwanted compounds (e.g. Fusarium spp., Aspergillus spp., Botritis spp., 
Cladosporium spp.). 

Although recent advances in antifungal chemotherapy have had an impact on 
these mycoses, expanding populations of immunocompromised patients will require 
newer approaches to antifungal therapy. The discovery of novel antifungal agents is 
25 thus an essential element of any new antifungal therapy. 

Classical approaches for identifying anti-fungal compounds have relied almost 
exclusively on inhibition of fungal or yeast growth as an endpoint. Libraries of natural 
products, semi-synthetic, or synthetic chemicals are screened for their ability to kill or 
arrest growth of the target pathogen or a related nonpathogenic model organism. 
30 These tests are cumbersome and provide no information about a compound's 
mechanism of action. The promising lead compounds that emerge from such screens 
must then be tested for possible host-toxicity and detailed mechanism of action studies 
must subsequently be conducted to identify the affected molecular target 

Cells from multicellular organisms can commit suicide in response to specific 
35 signals or injury by an intrinsic program of cell death. Apoptosis is a fomi of 
programmed cell death which leads to elimination of unnecessary or damaged cells. To 
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survive, all cells from multicellular organisms depend on the constant repression of this 
suicide program by signals from other cells (Raff, 1992). It has been assumed that 
such an altruistic fonn of cell survival arose with multiceliularity and would have been 
counterselected in unicellular organisms. Recent findings indicate, however, that a 
5 similar process of cell survival also operates in single-celled eukaryotes. 

It has been found that expression of the mammalian Bax gene triggers cell 
death in Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces 
pombe with morphological changes similar to apoptosis (Jurgensmeier et aL, 1997). 
However, the mechanism of Bax lethality in S. cerevisiae remains unclear. 
1 0 Since it has been discovered that the mammalian Bax gene triggers apoptotic 

changes in yeast (Ligr et al., 1998), this can be an indication that the molecular 
pathways eventually leading to programmed cell death may also be partially present in 
yeast cells and other unicellular eukaryotes. 

it is an aim of the present invention to provide nucleic acid as well as 
15 polypeptide sequences which represent potential molecular targets for the identification 
of new compounds which can be used in alleviating diseases or conditions associated 
with yeast or fungi infections. 

It is a further aim of the present invention to provide uses of these nucleic acid 
and amino acid molecules for the preparation of a medicament for treating diseases 
20 associated with yeast or fungi. 

It is also an aim of the invention to provide pharmaceutical compositions and 
vaccines comprising these nucleic acids or polypeptides. 

It is also an aim of the present invention to provide vectors comprising these 
nucleic acids, as well as host cells transfected or transformed with said vectors. 
25 It is also an aim of the invention to provide antibodies against these 

polypeptides, which can be used as such, or in a composition as a medicine for treating 
diseases associated with yeast and fungi. 

It is another aim of the invention to provide methods to selectively identify 
compounds capable of inhibiting or activating expression of such polypeptides in yeast 
30 or fungi infections. The nucleic acid and polypeptide molecules altematively can be 
incorporated into an assay or kit to identify these compounds. 

9 

It is also an aim of the invention to provide a. method of preventing infection witii 
yeast or fungi. 

It is also an aim of the invention to provide probes and primers derived from the 
35 nucleic acid sequences of the invention. 



Deciphering the Message in Protein Sequences: 
Tolerance to Amino Acid Substitutions 

James U. Bowie/ Johk F. Rbidhaar-Olson, Wendell A. Lim, 

Robert T. Saubr 



An amino acid sequence encodca a message that deter- 
mines the shape and function of a protein. This message is 
.highly degenerate in that many dificrent sequences can 
code for proteins with essentially the same.stnictiirc and 
activity. Comparison of different sequences with similar 
messages can reveal key features of the code and improve 
understanding of how a protein folds and how it per- 
fonns its function. iiSl'.'^- 



THB 0BNOM8- IS MANIFBST LARQSLY IN TH8 S9T OF PRO- 
ccini thai \x encodes. Ic is die ability of dtesc proteins co fold 
into unique diree-dimensional i cnicniru that allows chem to 
fitfi^on and any out die initructions of the geaome. Thus, 
comprehending the rules chat relate amino acid sequence co struc- 
nirc is fundvncntat co an understanding of biofogical proccssa. 
Because an amino acid sequence contains all of the informarion 
necessary lo detcnnine the stniccurc of a protein (Oi ii $HouId be 
possible to predict scrucrurc from sequence, and subsequently to 
infer detailed aspects of function from the structure. However,- both 
problems uc excremcly complex, and it seems unlilEcly that either 
will be solved in an exact manner in the near htcurc. It may be 
possible to obtain approximate solutions by using operimental data 
to simplify the problem. In this article, we describe how an analysis 
of allowed amino acid subsdtutions in proteins can be used to 
reduce the compleiity of sequences and reveal important aspects of 
structure and lunciion. 



Methods for Studying Tolerance to 
Sequence Variation 

There arc two main approachu to studying the tolerance of an 
amino acid sequence to change. The fint nuthod relies on the 
process of evolution, in which mutations are cither accepted or 
rcjcacd by natural selection. This method has been extremely 
powcrfj) for proteins such as the gtobins or cytochromes, for which 
sequences from many different species are known {2-^7). The second 
approach uses gcncdc methods to introduce amino acid changes at 



The •uten in in ihc Orawmni of Biology. Mauschuicni Iniiirvfc of Tcdinalogv, 
Ofiibridgf, MA 01139. 
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*Pfcuiit iddno: Dcpvtiyicnt of OumigOf ind Biochoniiiry and ihc Motcmlu 
Biolosr fatttiiuK. Univcnirr of Caliromia, Uw Angclea, Lo Anpki, CA 90024. 



specific positions in a cloned gene and uses sefcctions or screens co 
identify Ainctional sequences. This approach has been used co great 
advannge for proteiru that can be expressed in bacteria or yeast, 
where the appropriate genetic manipulations are possible (i, ^17). 
The cnci resulu of bodi mcthtxls are lists of active sequenca chat can 
be compared and analysed to identify sequence fcatutes dut are 
essential (or folding or furicison. If a particular propcny of a side 
chain, such as charge or ske, is important at a given posidim, only 
l:sKle_ch.ainsVt))at-hay^i^ propcny will be allowod. Con* 

versely» if tluT 3icnucJ Identity of the side chain is unimportant, 
then many different subsdtutions will be permitted. 

Studies in which dicsc methods were used have revealed that 
proteins are surprisingly tolerant of amino add subsdtutions 

For example, in studying the effecn of approximately ISOO 
single amino add subsrituriohs at 142 posidons in lot repivssor, 
'Miller and co-workers found that about one-half of all subsistunons 
were phenotypically silent (ff). At some posidons, many different, 
noncohscrvativc subsdtutions were allowed. Such ruidue posidons 
play little or no rok in structure and function. At other positions, no 
substitvdoru or only conscrvadvc subsiitutioru were allowed. These 
residues are the most imporrant for kt repressor activity: 

What roles do i/ivariant and conserved side chains pby in 
protcifu} Residues that are dirccriy involved in protein fiuiecions 
such as binding or catalysis will certainty be among the most 
conserved. For example, replacing the Asp in the catalytic triad of 
trypsin with Asn results in a lO^-fold reduction in activity {12). A 
similar loss of acdvity occurs in h repressor when a DNA binding 
raidue is changed from Asn to Asp (U). To carry out their 
function, however, these catalytic residues and binding residues 
must be precisely oriented in three dimension!. Consequently, 
mutadons in residues that arc required for structure formation or 
stability can also have drvnatic effects on activity (10, 1^16), 
Hence, many of the residues that arc conserved in sets of rdated 
sequences play structural roles. • 



Substitutions at Surface and Buried Positions 

In their initial comparisoru of the globin sequences, Peruti and 
co-workers found that most buried residues require nonpolar side 
chaiiu, whereas few features of surface side chains are generally 
conserved (6). Similar results have been seen for a number of protein 
familia (2, 4, S, 7, 17, 18). An eiampte of the sequence tolerance at 
surface versus buried sira can be seen in Fig. 1, which shows the 
allowed substitudons in X repressor at residue positions that are near 
the dimer interface but distant from the ON A binding surftce of the 
protdn (y). These substitutions, were identified by a funcdonsl 
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fhoit r^^n of k rcprasor. The wifd*iypc ft- 
quencc o shown oJong ihc center Hnc. The al- 
lowed fubitiiutieni shown above each poiition 
wciT identified randomJy mutstmg one to 
thfoe codont it a tunc by luing a cuiene method 
and applyiiig a funeiional selection (B) The 
•fracnona) wnrcni atcuitbUity ^42) or the wUd* 
type side chain in chc protein dimcr (4J) rdativc 
CO die same itomi in in Ab-X-Ali model tnpcp- 
tide. 
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aclcction after cusene muugcnesis. A htscognm of lidc chain phylogcnciic nudia, where it his been noted that the size decreases 
solvent accessibility in the csynal structure of die dimer is also and iiKreises at interacting residua are not ncceasarily related in a 
shown in Fig. I. At sis poiitions, only chc wild-type residue or limple complementary fashion (J, 7, I?). Rather, local volume 
rchtivcly conscrvathrc substitutions are allowed. Five of thcH"; ".changes a^.accommpd^ac^ by conformational dianga in ncaiby 
positions arc buried in the protein. In contrast, most of the hi|h1y''^id'e'diiuhs aiic^.by^'l^nety'bf badcb^ movemcnti. 
exposed positions tolerate a wide range of chemically different side 
chains, including hydrophilic and hydrophobic residues. Hence, it 
seems that most of the structural information in this region of the 
protein u carried by the residues that arc solvent inaccessible. 



.ConsCraints on Core Sequences 

Because core residue positions appear to be extremely important 
for protein folding or srability, we must understajid the factors that 
dictate whether a given core scc^uence wilt be acceptable. In general, 
only hydrophobic qr neutral residues are tolerated at buried sites in 
proteins, undoubtedly because of the large favorable contribution of 
the hydrophobic effect to protein stability {19), For czampic, Fig. 2 
shows chc rcjults of genetic studies used to investigate chc substitu- 
dons allowed at residue positions that form the hydrophobic core of 
chc NHs-terminal domain of X repressor (20). The aeccprable core 
sequences arc composed almost exclusively of Ah, C)rs»Thr, Val, He, 
Leu, Met, and Fhe. The iicccptability of many different residues at 
each core position presumably rcflecu the faa that the hydrophobic 
effect, unlike hydrogen bonding, does not depend on specific 
residue pairings. Although it is possible to imagine a hypothetical 
core structure chat is stabilized cxdusively by residues fomvng 
hydrogen bonds and salt bridges, such a core wcnild probably be 
difficult to corutruct because hydrogen bonds require pairing of 
donors and acceptors in an exaa geometry. Thus the repertoire of 
possible siructura that use a polar core would probably be extreme* ■ 
ty limited (21). Polar and charged residues arc occasionally found in 
the cores of proteins, but only at positioru where their hydrogen 
bonding needs can be saitsRcd {22). 

, . The cores of most proteins are quite closely packed (2J), but some 
volume changes are acceptable. In k repressor, the overall core 
volume of acceptable sequences can vary by about 10%. Changes ar 
individual sites, however, can be considerably larger. For example, 
as shown in Fig. 2, both Phe and Ala are allowed at the same core 
position in the appropriate sequence corneals. Large vohime 
changes at individual burled sites have also been observed in 



The Infonnational Importance of the Core 

With occasional exceptions, the core must remain hydrophobic 
and maintain a reasonable packing dcruity. However, since the core 
is composed of side chains that can assume only a limiccct number of 
conformadons {2^), efficient packing must be maintained without 
steric dashes. How important are hydrophobicity, volume, and 
steric complementarity in determining whether a given sequence can 
form an acceptable core? Each factor is essential in a physical sense, 
as a stable core is probably unable to tolerate uruatisfied hydrogen 
bonding groups, large holes, or steric overlaps {2S). However, in an 
informational sense, these faaors are not equivalent. For example, in 
experiments in which three core residues of k repressor were 
mutated siriiultancously, volume was a relatively unimportant infor* 
madonal constraint because three-quarters of all possible combina- 
tions of the 20 naturally occurring amino adds had volumes within 
the range tolerated in the core, and yet most of these scquenca were 
unacceptable (20). In contrast, of the sequences that contained only 



Fig, 2. Amino add substit\h 
doni iUowed in the core of X 
repressor. The wild-typc aide 
cloins are shown piaorially in 
the approsimaie orieniaiion 
seen in chc cryinl smicnux 
[4)). The lists of allowed sub> 
stifuiiofu at each position are 
shown below the wild-i^pe 
side chains. These lubsnru- 
lioru were identified by ran- 
domly mutating one to four 
residua at • time by uin^ a 
caucne method and applying 
a functional selection {20). 
Not bU lubsncudoru arc a)* 
towed in every sequence back' 
ground* 
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ihc a'ppfopfutB hydiophobifc raiducs, s significam fraction wot 
asoeptablc. Hcnoe, chc hydipphobtdty qT a aequaoc conoJns 
mort inlbmudon about iti potcntul accqMibUiiy in die coit dun 
'dpcs tfic cotal 0tdc duin volume. Sixric cojnps&biliiy wu tncbmcdi- 
OB bciwccii volume vid hydiophobidfy tn inlbnnationil imfO^ 



The Informational Importance of Sui&ce Sites 

Wc have noted chai many suiface siva an lolostc a wide wiccy 
of lidc chaini, induding hydrophUic and hydrophobic roidua. "niii 
nnili might be taken w indicaee'dut futhsc posidoni contain Uttte 
fmioun) infomupon. However, Bashfbrd ct sf., tn an canmsivc 
ana)yiu of gjobin acquam (4), found a mong biaa against large 
hydrophobic naidua it many aurte pcsitjons. At one level, thb 
may reflect oonstraints imposed by ptoiletn sohibiliiy, because bige 
patchei of hydrophobic aurfsce reridua would pxmmably lead to 
aggrrgation. Ac a moiv fundamcncaJ level, piotcin (biding requiro a 
partitioning between hufioe and buried posidions. Gonsequendy, to 
achieve a unique native state wichout significant compcddon horn 
other CDnformaDons, ii may be unportant that some sicca have a 
dcddcd pfeference for cztcjior rather than interior positions. As a 
nruJt, nuny nir^oc sites can accept hydrophobic residues individ- ^ 
uaUy, but die surface as s whole can pn>bably tolcnu only a 
modente number of hydrophobic side duins. 



IdentifiGation of Hessdue Roles fiom 
Sets of Sequences 

Often, s pioDon of Intcrtsi b a m crnbu of a family of related 
scquenoa. What can wc infer from the pattern of allowed substitu* 
dons at positions in sets of aligned srqumors generaxed by gcmxic 
or phytogcnetie methods? lUsidue poiitioni diar can accept a 
number of different side chains^ indudiing charged highly polar 
residues^ are almost certain to be on die ptoccin surface. Residue 
positions that remain hydrophobic, whether variable or not, are 
likely bo be buried within the structure. In Fig. 3, thojc residue 
posidotu in X repressor chat can accept hydrophilie side chatru are. 
ihown in orange and those chat cannot accept hydiophilic side 
duina arc shtywn in ^rvm.. The obligate hydrophobic poiiinms 
^t£tt€ the core of the smiccuie, whereas positions that can accept 
hydrophilie side chairu define the auiftis. 

FunctionaJiy important itsiducs should be oonscivcd in leti of 
acdve seqitfnoo^ but it is not possible to dedde whether a side diain 
is funcdonaUy or snrucnirdly important iim because it is invariant or 
conscfved. To make this disdncdon iviuyes an independent assay of 
protein folding. The ability of a mucuit pcocdn to maiocain a stably 
folded suucEujc can often be measured by biophysical techniques, 
by rusoeptibility to intraoeOular proioolysis or by binding lo 
andbodio sperifie for die native souctuic {27, 29). In die tatter 
caso; it is possible to soecn proteins in musaiBd dona for the 
ability to fold even if these proteins are inacdve. Sets of sequenco 
that allow fbmrution of a stable stnicmre can thai he compared bo 
the sets that allow both folding arul funcnon, wjch the active site or 
binding residua bcang those d)at are variable in the set of sobic 
proecim but invariant in the set of fvincdonal procciru. The DNA- 
binding rcsiduo of Arc itpitsior weic idcncifiod by this method (S). 
The rDoeptor-bindlng residua of human growth hormone were also 
ldoni£od by comparing the stabilido and acdvitics of a act of 
mutam scquenca (2S). However, in this case, -the mutantt wot 
gamted ai hybrid sequmoo between growth hoimonc and itlatcd 
'honnono with diffmni bindirtg specifidtsca. 



Implications for Structure Prediction 

Ai pitscnt; the only reliable meiliod for piediaing a bw- 
ttnhsdon Beraaiy struoturr of a' oew piosdn is* by idcndfying 
ssqutncs similari^ to a pioicui whose ssniouic is abtady known 
(29, JU). Hoiimr, is is (drai diffiodt to alfgn sequel 
of aequenoE aimllari^ decnascs, and it is somciiina inqsasable to 
dctca sacinically sl^aifiGani Mquoioc sImiUfity betweta distantly 
icbred piotdns. Because the number of'knowa nquoion ii 
greater dun die number of known suiictiutai it would be advasia- 
gocw to iiKivasc the fcadi of the available micninl iidic^^ 
improving irtethods for dctociifig distant lequoise rdatibns and for 
subsequently alibiing these sequences based on itsuowalpxindples. ' 
In a normal ho i ii u l o pr scaidi^ thcsatucnccdanbaae is gawned with 
a single tmaequenee, and cvoy nsUue must be wdghted equally. 
However, some icsiduo art mosv ioippitant duo othea and ihouU 
be weighted aooordingVy. Morecsver, certain regions of the proton 
arc more likely to contain gaps than others. Bodi kinds of informa- 
cion can be oboiriad fimi sequeiiee aetSfe and several lecluuqua have 
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Rg. a Toknoce of postcianf to dk NHrtennizuJ doeuin of tvp^tatn 
^rdfofihillc tide chains. The oompks (f J) of die itprcssor diner (bur ) sod 
operator DNA (white) is shown. In (A), pooitioas that can mierair 
ttydrophilic Bide chain arc. ihowo in orange. Ihc saoK dde d 
CDifB) widiout the ronaining piosein atoms^ Id (C), posidflni thar raquire 
bydrophobic or ntural «lde duini aiv diowo in pvok These ddr chaioi aiv 
darvo in (D) widiout the itmaining proton atoms. Abra thite^fiMnhi of 
die 92 aide chains (n die NHrtemiiJul donuio arc induM la bo^ 
(D). Ihc lemaining posidoas have not been land. Oats are fiooi {9, 14, 2Q, 
if, 44}.- 
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been uicd to combine such infbrmarion mxo more appropriacci/ 
''wcij^iob icqucncc icarcha ind alignments (il). Thae methods 
weit used to align the icquenca of rcuoviral pfotcases with aspartic 
proicasu, which in mm allowed eommicrion of a three-dimension* 
al model for the protease of human immunodc/iciency virus type 1 
{29j. Comparison with the rcecntly determined crystal structure of 
diis protein revealed reasonable agreement in many areas of the 
prtdined stnicturc {32). 

llic structural infomudon it most surface sites is highly degener- 
ate. Except for hmaionally important raiducs, exterior positions 
seem to be imporum chicHy in maintaining a reasonably polar 
surface. The information connined in buried residues is also 
dcgcnente, the main requirement being that these residues remain 
hydrophobic. Thus, at id most basic level, the key structural 
message in an amino acid sequence may reside in its spcdfic pattern 
of hydrophobic and hydrophilic residues. This is meant in an 
informational sense. Qearly, the precise structure and srability of a 
protein depends on a large number of detailed interactions. It is 
possible, however, that structural prediction at a more primitive 
level can be accomplished by concentrating on the most basic 
informational aspens of an amino add sequence. For example,- 
amphipathic patterns can be extracted from aligned sets of sequences 
and used, in some cues, to identify secondary structures. 

If a region of secondary structure is packed against the hydropho* 
bic core, a pancm of hydrophobic residues reflecting the periodicity 
of the secondary structure is dipccted {33, 34). Tlxese patiems can be 
obscured in individual sequences by hydrophobic residues on the 
protein lurface. It is rare, however, for a surface position to remain 
hydrophobic over the course of cvohidon. Conscqucndy, the am- 
phtpathtc pancms expected for simple secondary structures can be 
much clearer in a set of related yequcnccs (tf). ThtJ prirKiple U 
illustrated in Fig. 4, which shows helical hydrophobic mon^nt p!ou 
for the Antenna pedia homcodomain sequence (Fig. 4 A) and for a 
composite sequence derived from a set of homologous homccxlo- 
main proteins (Fig. 4B) (35). The hydrophobic moment is a simple 
measure of the degree of amphipathic charaacr of a sequence in a 
givai mondary structure {34). The amphipathic character of the 
three o -helical regions in the Antcnnapedia protein {3S) is clearly 
revealed only by the analysis of the eombinc.d set of homcodomain 
scquenca. The secondary structure of Arc repressor, a small DNA- 
binding protein, was recently predicted by a similar method {8) and 
confinmcd by nuclear magnetic resonance studies {37). 

The specific panern of hydrophobic and hydrophilic residues in 
an amino acid sequence must.timtt the number of different structures 
a given sequence can adopt and may indeed define io overall fold. If 
this is true, then th.c amngcmcnt of hydrophobic and hydrophilic 
residua ihould be a characteristic feature of a particular fold. Sweet 
and Eisenbcrg have shown that the correlation of the pattern of 
hydrophobicity between two protein sequences is a good criterion 
for their structural rclatcdncss {38)* In addition, several studies 
indicate that pancms of obligatory hydrophobic pcuitiolu identified 
from aligned sequences are distinctive features of sequences that 
adopt the same itructurc {4, 79, 38, 39), Thus, the order of 
hydrophobic and hydrophilic residues in a scc{uencc may actually be 
sufficient information to determine the basic folding pattern of a 
protein sequence. 

Although (he patum of sequence hydrophobicity may be a 
charactchsnc feature of a particular fold, it is not yet clear how such 
pancms could be used for prediction of structure dc novo. It is 
important to understand how pancms in sequence space can be 
related to stAJcturcs in conformation space. Lau and Dill have 
approached this problem by studying the properties of simple 
sequencu composed only of K (hydrophobic) and P (polar) groups 
on two-dimensional laniccs {4(3). An example of such a repruenta- 



tion is shown in Fig. S. Residues adjicou in the sequence must 
occupy adjacent squares on the lattice, and two residues cannot 
occupy the same space. Free energies of particular confonnationi ivt 
evahiated with a single term, an attraction of H. groups. By 
considering chains of ten roidua, an exhaustive conformational 
search for all 1024 possible scquenca of H >nd P residues was 
possible. For longer sequences only a representative fraction of the 
allowed Kqucncc or conformation space could be explored. The 
significant results were as follows: (i) not all sequences can fold into 
a "^native** structure and only a few lequcncu form a unique native 
smicture; (ii) the probability that a sequence will adopt a unique 
native siiucture inarcasu with chain length; and (iii) the native 
stata are compact, contain a hydrophobic core surrounded by pobr 
residues, and contain significant secondary structure. Although the 
gap between these two-dimeruional simulations and three-dimen> 
sional structures is large, the use of simple rules and sequence 
ceprcscntatioru yitldi results similar to diosc expcaed for real 
proteins. Thrce<dimensional lattice methods aire alsio begiiming to 
be developed and evaluated (41). 



Summary 

There is more information in a set of related sequences than in a 
singk^cqucnce, A number qf .practical, applications ari/e from an 
analysu:^-the>i'tclcnrTcc^bCr(^rd^ to change. First, such' 

information peimiu the evaluation of a residue's importance to the 
fiincrion and stability of a protein. This ability to identify the 
cssenrial elemenu of a protein sequence may improve our under- 
standing of the determinanu of protein folding and stability as well 
as protein function. Second, patterns of tolerance to amino acid 
subsritutions of varying hydrophilicity can help to identify residues 
likely to be buried in a protein structure and those likely to occupy 
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FlQ. 4. Helical hydro- 
ptobic momcntt caJcu- 
uicd by using (A) the 
Anienn a pcdi a no mcodo- 
main sequence or (B) a 
set of 39 aligned homeo- 
domiin sequences {JS). 
The bars indicitc the n- 
ecru of the helical re- 
gtOfU identified in nuclc' 
ar magnetic rcsonirae 
studies of the Antcnna- 
pcdu honvcodomiin 
To determine hy> 
drophobtc moments, 
Rsiduca were uiignod 
ra one of three groups: 
HI (high hjrdiophobici- 
ty - Trp, Ik, Phe, Leu, 
Met. VaJ, or Cys); H2 
(medium hydrophobic' 
ity o Tyr, Pro, Ais, Thr, 
Hb, Cly, or Set); ind H3 (low hydrophobicity «'Cln, Am, Glu. Asp, Lryi, 
Of Arg). For the aligned homeodomain sequcnea, the residues at caeh 
position were lorted by their hydrophobicity oy using the scale of Fauchere 
ind Ptiika (41). Arg snd Lys were not counted unless no odicr 'residue was 
found at the position, becauK they contain tong aUphaiic side chains and can 
thereby lubstinnc Tor nonpohr raiducs ai some buried sites. To accoum for 
pouibfc scqurnce errors ind rare ciccpitons, the mon hydrophilic rtsidue 
illowcd at each position was discarded unleu if wu observed cwicc. The 
second moss hydrophilic raidue was then chosen to represent the hydropho* 
biciiy of each position. An cisht<naidue. window was used and the vccton 
proicaed radtaOy every 100*. The vector maminides were assigned a value of 
1, 0, or - 1 for positioru where the hydrophobtdty group was HI, H2, or 
H3, icspectivdy. 
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fiB. 5. A tcpicKimtipn of oiw com* 
psn conforinttion for • paitieulai 
tcquoicc of H and P raiducs on a 
rwcduTKruionai aquarc Uratx, 
(Adapted ftwn {40}, widi pcimii- 
lion of chc American Chcnuctl Sod- 
eijr] 



lurficc poiitions. The a/nphipathic pancrm chat emerge can be used 
ro identify probable regiony of secondary imiciurc. Third, incorpo- 
radng a lunowlcdgc of allowed aubinniciom can improve chc ability 
CO detect and align distantly related proteins bocausr the cs^scntbl 
residues can be given prominence in the alignmcnc scoring. 

As more sc()uences arc determined,. it becomes increasingly likely 
that a protein of interest is a member of a family of related 
sct|uencu. tf this is not the case, it. is now possible co use genetic 
mnhodj to generate lists of allowed amino acid substitutions. 
Conscqucndy, at least in the short term, it may noi be necessary to 
solve (he folding problem for individual protein scqucncca. Instead, 
information from sequence sets could be used. Perhaps by simplify- 
ing sequence space through the identification of key rcstduer, and by 
simpti^ing conformation space* as in the hnicc methods, ii will be 
possible to develop algorithms to generate a limited number of trial 
struccvru. These trial strvccurca could then, in turn, be evaluated by 
liijther experiments and more sophisticated energy calculations. 
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